Computer system for sharing i/o device

ABSTRACT

Provided is a computer system in which an I/O card is shared among physical servers and logical servers. Servers are set in advance such that one I/O card is used exclusively by one physical or logical server, or shared among a plurality of servers. An I/O hub allocates a virtual MM I/O address unique to each physical or logical server to a physical MM I/O address associated with each I/O card. The I/O hub keeps allocation information indicating the relation between the allocated virtual MM I/O address, the physical MM I/O address, and a server identifier unique to each physical or logical server. When a request to access an I/O card is sent from a physical or logical server, the allocation information is referred to and a server identifier is extracted from the access request. The extracted server identifier is used to identify the physical or logical server that has made the access request.

CLAIM OF PRIORITY

The present application claims priority from Japanese applicationsJP2006-194534 filed on Jul. 14, 2006 and JP2005-340088 filed on Nov. 25,2005, the contents of which are hereby incorporated by reference intothis application.

BACKGROUND

This invention relates to a computer system in which a plurality ofservers are integrated into one, and more particularly, to a computersystem in which an I/O card is shared between a physical server and alogical server.

Recent improvement of computer performance, especially the progress ofmulticore processor technology, has contributed to increased adoption ofa cost reduction method in which processing that has conventionally beendistributed among a plurality of servers is concentrated into oneserver. An effective way to concentrate processing in a server is to runa plurality of operating systems on one server by utilizing serverpartitioning.

Server partitioning is classified into physical partitioning whichallocates OSs on a node basis, “node” being a processor, a memory, orthe like, and logical partitioning which virtualizes a physicalprocessor and a physical memory to create an arbitrary number of logicalpartitions in a computer. Physical partitioning and logical partitioningeach have advantages and disadvantages.

With physical partitioning, it is not possible to partition a serverinto more pieces than the number of physical resources included in theserver, and therefore only a limited number of servers can be integratedinto one. On the other hand, since servers integrated by physicalpartitioning each have exclusive use of a hardware resource, the serverperformance is high.

Logical partitioning is implemented by firmware called hypervisor orvirtualization software. In logical partitioning, operating systems(called guest OSs) are executed on logical processors provided by ahypervisor. With a plurality of logical processors mapped onto aphysical processor by the hypervisor, the unit of partitioning is finerthan a node. Furthermore, when it is a processor that is partitioned,one physical processor may be switched in a time-sharing manner among aplurality of logical partitions. In this way, more logical partitionsthan the number of physical processors can be created and runconcurrently with one another. However, in the logical partitioning, theintervention of virtualization software causes overhead, which makeslogical partitioning inferior to physical partitioning in terms ofperformance.

Thus physical partitioning and logical partitioning each have advantagesand disadvantages, and it is therefore desirable to combine the twoflexibly to suit the purpose.

Server integration utilizing those physical server partitioning andlogical server partitioning has the following problems.

First of all, the above type of server integration requires an I/O cardfor each server, and the number of servers that can be integrated islimited by the number of I/O cards that can be mounted. A problem secondto this is that the server integration can degrade the overall systemperformance in some cases.

Prior art given below is known as solutions to the above-mentionedproblems.

JP 2004-220218 A discloses a technique for preventing the I/Operformance of a logically partitioned server from dropping. Thistechnique accomplishes direct memory access, DMA, from an I/O card in alogical server environment by preparing a conversion table forconversion between a guest DMA address shown to a logical server and ahost DMA address in main storage.

A similar technique is disclosed in an online article, IntelCorporation, “Intel Virtualization Technology for Directed I/OArchitecture Specification” retrieved Jul. 6, 2006 at an Internet URLhttp://www.intel.com/technology/computing/vptech/. This techniqueinvolves extended application to I/O devices of Translation Look-asideBuffer (TLB), which is a mechanism for converting a virtual address intoa physical address when a processor accesses memory. By enteringdifferent conversion entries for different I/O devices, the same guestDMA address can be converted into different host DMA addresses.

In addition to the above-mentioned hypervisor, server virtualizationsoftware disclosed in U.S. Pat. No. 6,496,847 is another known exampleof software that partitions one computer into an arbitrary number oflogical partitions. The server virtualization software makes sure that ahost OS intervenes I/O access from a logical server (guest), therebyenabling a plurality of logical servers to share an I/O card. The numberof necessary I/O cards can thus be reduced among logical servers.

SUMMARY

The above prior art can solve one of the two problems mentioned above,but not both of them at the same time.

JP 2004-220218 A is built on the premise that I/O cards and logicalservers are on a one-on-one basis, and a logical server is identified byan I/O card from which a DMA request is made. JP 2004-220218 A istherefore not applicable to a computer system in which an I/O card isshared.

In a computer where an I/O is accessed via a host OS as in U.S. Pat. No.6,496,84, DMA transfer from an I/O requires memory copy between the hostOS and a guest OS, which makes lowering of performance due to increasedoverhead unavoidable. To elaborate, since I/O access between a guest OSand an I/O card is always relayed by the host OS, such cases as DMAtransfer between a guest OS and an I/O card require processing oftransferring data from a memory area of the host OS to a memory area ofthe guest OS and the host OS processing causes overhead that lowers theI/O access performance (transfer rate and response).

This invention has been made in view of those problems, and it istherefore an object of this invention to share an I/O card among aplurality of servers while avoiding lowering of I/O access performance,and to provide a computer system that accomplishes DMA transfer despitesharing of an I/O card among a plurality of physical servers and logicalservers.

According to an aspect of this invention, a computer system includes: atleast one node composed of at least one processor and memory; an I/O hubconnecting at least one I/O card; a switch connecting the node and theI/O hub; and a server run by one or a plurality of the at least onenode, in which the server is set in advance to allow one of exclusiveuse and shared use of the I/O; the I/O hub allocates a virtual MM I/Oaddress unique to each server to a physical MM I/O address associatedwith an I/O card; the I/O hub keeps allocation information indicatingthe relation between the allocated virtual MM I/O address, the physicalMM I/O address, and a server identifier unique to the server; and, whenone server sends a request to access an I/O card, the I/O hub refers tothe allocation information to identify the server that has issued theaccess request using a server identifier that is extracted from theaccess request.

This invention enables a plurality of servers to share an I/O card. Thiseliminates the limitation on the number of servers integrated which hasbeen limited by how many I/O cards can be mounted to a computer system,and can lead to more flexible server configuration and effective use ofhardware resources.

This invention also makes DMA transfer from an I/O card possibleirrespective of whether an I/O card is shared or not, which can reduceadverse effect on performance even when an I/O card is shared among aplurality of servers.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a configuration block diagram of a computer system accordingto a first embodiment of this invention.

FIG. 2 is a configuration block diagram of an I/O processor according tothe first embodiment of this invention.

FIG. 3 is a function block diagram of an I/O card sharing moduleaccording to the first embodiment of this invention.

FIG. 4 is an explanatory diagram of an example of I/O card sharingsettings according to the first embodiment of this invention.

FIG. 5 is an explanatory diagram of an example of an MM I/O write systemTx according to the first embodiment of this invention.

FIG. 6 is an explanatory diagram of an example of an MM I/O write PCI Txaccording to the first embodiment of this invention.

FIG. 7 is an explanatory diagram of an example of a DMA request PCI Txaccording to the first embodiment of this invention.

FIG. 8 is an explanatory diagram of a guest qualified DMA addressaccording to the first embodiment of this invention.

FIG. 9 is an explanatory diagram of an example of a DMA request systemTx according to the first embodiment of this invention.

FIG. 10 is an explanatory diagram outlining initialization processing ofan MM I/O initializer according to the first embodiment of thisinvention.

FIG. 11 is an explanatory diagram of an example of an MM I/O areaallocation table according to the first embodiment of this invention.

FIG. 12 is an explanatory diagram of an example of an MM I/O addressconversion table according to the first embodiment of this invention.

FIG. 13 is a flow chart for the initialization processing of the MM I/Oinitializer according to the first embodiment of this invention.

FIG. 14 is a flow chart for virtual MM I/O allocation processingaccording to the first embodiment of this invention.

FIG. 15 is a flow chart for physical initialization processing accordingto the first embodiment of this invention.

FIG. 16 is an explanatory diagram showing the relation between a virtualMM I/O address and a physical MM I/O address according to the firstembodiment of this invention.

FIG. 17 is an explanatory diagram of an address map in an I/O P memoryaccording to the first embodiment of this invention.

FIG. 18 is an explanatory diagram showing MM I/O write processingaccording to the first embodiment of this invention.

FIG. 19 is an explanatory diagram showing processing of a DMA requestdecoder according to the first embodiment of this invention.

FIG. 20 is an explanatory diagram of an example of a DMA addressconversion table according to the first embodiment of this invention.

FIG. 21 is an explanatory diagram showing a guest MM I/O area and aguest memory area in a command chain according to a modified example ofthe first embodiment of this invention.

FIG. 22 is a configuration block diagram of a computer system accordingto a second embodiment of this invention.

FIG. 23 is a block diagram of a blade server system according to a thirdembodiment of this invention.

FIG. 24 is a block diagram of an I/O sharing module according to thethird embodiment of this invention.

FIG. 25 is an explanatory diagram of a PCI transaction according to thethird embodiment of this invention.

FIG. 26 is an explanatory diagram of a memory space in a serveraccording to the third embodiment of this invention.

FIG. 27 is an explanatory diagram of a memory space in an I/O processorblade according to the third embodiment of this invention.

FIG. 28 is an explanatory diagram showing an example of an addressinformation table according to the third embodiment of this invention.

FIG. 29 is an explanatory diagram showing an example of an I/O cardsharing settings table according to the third embodiment of thisinvention.

FIG. 30 is a block diagram of the relation between the I/O card sharingmechanism and the I/O processor blade according to the third embodimentof this invention.

FIG. 31 is an explanatory diagram showing the flow of I/O card sharingprocessing according to the third embodiment of this invention.

FIG. 32 is a time chart showing the flow of the I/O card sharingprocessing according to the third embodiment of this invention.

FIG. 33 is a time chart showing an example of address informationsetting processing, which is executed when a server blade is activatedaccording to the third embodiment of this invention.

FIG. 34 is a block diagram of a blade server system according to afourth embodiment of this invention.

FIG. 35 is an explanatory diagram of a memory space in a serveraccording to the fourth embodiment of this invention.

FIG. 36 is a time chart showing the flow of I/O card sharing processingaccording to the fourth embodiment of this invention.

FIG. 37 is an explanatory diagram of a memory space in an I/O processorblade according to the fourth embodiment of this invention.

FIG. 38 is a block diagram of a blade server system according to a fifthembodiment of this invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Hereinafter, embodiments of this invention will be described withreference to the accompanying drawings.

First Embodiment

First, the outline of a first embodiment of this invention will bedescribed.

A computer system according to the first embodiment of this invention isconfigured such that a plurality of physical servers and a plurality ofI/O hubs are connected to each other via a switch. It should be notedthat the physical server may include a plurality of logical serversconfigured by a hypervisor. On each of the physical servers and logicalservers, an operating system (hereinafter, referred to as “OS”) is run,and an application is run on the OS. It should be noted that the OSwhich is run on each of the physical servers and logical servers, andeach application which is run on the OS are called “guest”.

The I/O hub has an I/O card and an I/O bridge connected thereto. Aplurality of I/O cards can be connected to the I/O hub via the I/Obridge. The I/O hub includes an I/O processor having a function ofmonitoring an access to a memory mapped I/O (MM I/O) area of the I/Ocard, and arbitrating the access in a case where two or more guestsaccess the same I/O card. The I/O processor is composed of a dedicatedprocessor and memory provided in the I/O hub. It should be noted thatthe I/O processor may be realized by a part of resources of the physicalservers or the logical servers in the computer system.

The computer system is connected to a service processor (hereinafter,referred to as “SVP”) so that the SVP can communicate with the physicalservers and the I/O cards. The SVP sets mapping between the physicalservers and the logical servers, and the I/O cards used by the physicalservers and the logical servers in response to an instruction from anadministrator or a user. It should be noted that the SVP includes asetting console which accepts settings by the administrator or the user.

Next, the outline of an operation of this invention will be described.

First, when the computer system is initialized, the MM I/O areaassociated with the I/O card to be used by the physical servers and thelogical servers is secured in a memory space by firmware or thehypervisor. According to this invention, by the hypervisor or thefirmware, a virtual MM I/O area is secured with respect to the I/O cardwhich may be shared by a plurality of guests. The virtual MM I/O area isshown to the respective guests. The guests access the virtual MM I/Oarea, thereby making it possible to access an allocated I/O card. Thevirtual MM I/O area is partitioned into pieces according to the numberof guests sharing the I/O card. Different guests are each surelyallocated with a different MM I/O address.

The I/O hub has a conversion table containing virtual MM I/O addresses,physical I/O addresses which are real MM I/O addresses of the I/O cards,and guest identifiers for identifying each guest.

After that, an access to the I/O card from the guests is executed. Inthis case, each guest issues an access to the virtual MM I/O address.The I/O hub refers to the conversion table, and converts the virtual MMI/O address into the physical MM I/O address and into the guestidentifier. Thus, the I/O hub can identify each guest.

Next, in order to accomplish a direct memory access (DMA) transfer fromthe I/O card shared by the guests, the following processing is executedby using the guest identifiers.

The I/O processor traps write in the MM I/O area unique to the I/O card.The trapped write in the MM I/O area is redirected to a memory area(i.e., I/O P memory) exclusively used by the I/O processor, whereby datais written in the MM I/O area. Among the writes, with regard to anaccess in association with a setting of a DMA address, the guestidentifier is buried in a significant bit (or higher-order bit) of aphysical address to be set, thereby making it possible to identify eachguest of a requester of the subsequent DMA access.

Upon trapping the access to a command register, the I/O processortransfers contents of the redirected memory area to a correspondingphysical MM I/O area. As a result, the actual access to the I/O card isexecuted, and a request for DMA is set.

Upon receiving the setting of the request for DMA, the I/O card startsthe DMA transfer. In this case, the I/O hub extracts the guestidentifier from the significant bit of the address for the DMA transfer,refers to a guest conversion table, and converts the guest physicaladdress into a host physical address, thereby making it possible toaccomplish DMA in which the I/O card directly accesses the guest.

FIG. 1 is a configuration block diagram of an example of a computersystem according to the first embodiment of this invention.

In the computer system shown in FIG. 1, one or more physical servers 300(300A to 300C) and one or more logical servers 310 (310C1, 310C2)configured by the physical servers 300 are connected to each other via aswitch 600.

The physical server 300 is composed of one or more nodes 200 (200A to200C). Each node 200 includes one or more CPUs 100 (100A, 100B), amemory 120A, and a north bridge 110A. The CPUs 100 and the memory 120Aare connected to each other via the north bridge 110A.

It should be noted that the CPU 100 may be configured such that the CPU100 has a function corresponding to the north bridge 110A and the CPU100 and the memory 120A are directly connected to each other. Thefollowing description applies to either of the cases. The followingdescription also holds true of a multi-core CPU which includes aplurality of CPU cores in a single CPU 100.

In addition, it is possible to configure a single physical server 200 bya plurality of nodes 200. In this case, the plurality of nodes 200 areconnected to one another via the switch 600, and a symmetric multipleprocessor (SMP) is composed of the CPUs 100 in the plurality of nodes200.

Further, a hypervisor 500C is operated on a single physical server 300,and the resource of the physical server 300 is partitioned throughprocessing of the hypervisor 500C, thereby making it possible to operatea plurality of logical servers 310 on the single physical server 300. AnOS 400 (400A to 400C2) is operated on each of the plurality of logicalservers 310. The respective logical servers 310 exclusively use acomputer resource of the node 200 run on the physical server 300, thatis, a part of the CPU 100 or the memory 120A. Alternatively, thecomputer resource is shared in a time-sharing manner.

The switch 600 is connected to one or more I/O hubs 700 (700A, 700B).Each I/O hub 700 is connected to a plurality of I/O cards 800 (800A to800D) via one or more I/O buses 850. In addition, the I/O hub 700 may beconnected to further more I/O cards 800 via an I/O bridge 810.

The I/O card 800 includes a direct memory access (DMA) controller (DMACTL) 803 for directly accessing a memory address space of the physicalserver 300 or the logical server 310. The I/O card 800 includes a baseaddress register (ADR REG) 801 for specifying a base address of the MMI/O of the physical server 300 or the logical server 310 for executingDMA by the DMA CTL 803, and a command register (CMD REG) 802 forspecifying a request for the I/O card 800. The DMA CTL 803 executes theoperation corresponding to the command written in the CMD REG 802 withrespect to the address written in the ADR REG 801. It should be notedthat the I/O card 800 includes a register (not shown) (e.g., aconfiguration register or a latency timer register) conforming to thePCI standard.

It should be noted that the I/O bus 850 may be a link through which theI/O cards 800 are each connected to the I/O bus 850 one-on-one basissimilarly to a PCI-Express, or a link through which a plurality of I/Ocards 800 are connected to a single bus similarly to a PCI bus. In anycase, the following description applies to either of the cases.

The switch 600 provides flexibility which enables the access from thephysical server 300 or the logical server 310 to an arbitrary I/O card800.

The I/O hub 700 includes an I/O card sharing module 750 and an I/Oprocessor 710.

The switch 600 is connected to a SVP 900 for managing the entireconfiguration of the computer system. The SVP 900 is connected to asetting console 910 via a management network 920. An administrator ofthe system configuration uses the setting console 910 to set the entireconfiguration of the computer system, and more particularly, to setarrangement or partition of the physical server 300 and the logicalserver 310, allocation of the I/O card with respect to the physicalserver 300 or the logical server 310, and the like.

In the example shown in FIG. 1, two nodes 200 (i.e., node 200A and node200B) constitute the SMP, to thereby constitute a physical server 300A.On the physical server 300A, an OS 400A is run.

Further, a node 200C constitutes a physical server 300C. On the physicalserver 300C, a hypervisor 500C is run. The hypervisor 500C constitutes alogical server. As a result, the physical server 300C is partitionedinto two logical servers, that is, a logical server 310C1 and a logicalserver 310C2. An OS 400C1 is run on the logical server 310C1, and an OS400C2 is run on the logical server 310C2.

The switch 600 connects two I/O hubs 700 (i.e., I/O hub 700A and I/O hub700B) to each other. The I/O hub 700A is connected to a plurality of I/Ocards 800 via one or more I/O buses 850. An I/O bus 850A is connected toan I/O card 800A, and an I/O bus 850B is connected to an I/O card 800B.Further, The I/O bus 850C connects two I/O cards 800C and 800D via theI/O bridge 810.

FIG. 2 is a configuration block diagram of the I/O processor 710.

The I/O processor 710 monitors the I/O access, and executes conversionprocessing of the I/O access.

The I/O processor 710 includes an I/O P CPU 711, an I/O P memory 712,and an I/O P north bridge 713. The I/O P CPU 711 and the I/O P memory712 are connected to each other via the I/O P north bridge 713.

In the above-mentioned example of FIG. 1, the I/O processor 710 isprovided in the I/O hub 700, but the I/O processor may be providedseparately from the I/O hub 700 to be independently connected to thecomputer system via the switch 600. A part of the physical server 300 orthe logical server 310 may provide a function corresponding to the I/Oprocessor 710.

FIG. 3 is a function block diagram of the I/O card sharing module 750.

The I/O card sharing module 750 includes an MM I/O write decoder 751, aDMA request decoder 752, an MM I/O initializer 760, and I/O card sharingsettings 740.

Upon receiving an MM I/O write system Tx 1100 transferred from theswitch 600, the MM I/O write decoder 751 converts the MM I/O writesystem Tx 1100 into a memory write or an interruption with respect tothe I/O processor 710. The MM I/O write decoder 751 also processes afield included in the MM I/O write system Tx 1100 to generate an MM I/Owrite PCI Tx 1100, and issues the generated MM I/O write PCI Tx 1100 tothe I/O bus 850.

Upon receiving a DMA request PCI Tx 1120 transferred from the I/O bus850, the DMA request decoder 752 identifies each issuer guest, convertsthe DMA address included in the DMA request PCI Tx 1120 to generate aDMA request system Tx 1130, and issues the generated DMA request systemTx 1130 to the switch 600.

Upon receiving an I/O count up request 1140 which has been issued fromthe hypervisor or the guest and transferred from the switch 600, the MMI/O initializer 760 issues an I/O count up request 1141 to the I/O card.Then, upon receiving an I/O count up response 1142 from the I/O card800, the MM I/O initializer 760 refers to contents of the I/O count upresponse 1142 and contents of the I/O card sharing settings 740 togenerate an I/O count up response 1143 with respect to the I/O count uprequest 1140. Then, the MM I/O initializer 760 issues the I/O count upresponse 1143 to the requester.

The I/O card sharing settings 740 hold a table in which shared orexclusive use is set for each I/O card 800 connected to the I/O hubs850. The SVP 900 refers to or updates the table of the I/O card sharingsettings 740.

FIG. 4 is an explanatory diagram of an example of the I/O card sharingsettings 740.

The I/O card sharing settings 740 are composed of entries including aphysical card ID field, an I/O card type field, and an I/O card sharingattribute field. The entries are set for each I/O card 800.

The physical card ID field stores a physical I/O card ID 741 which is anidentifier of the I/O card 800. The I/O card type field stores an I/Ocard type 742 which is an identifier indicating the type of the I/O card800. The I/O card sharing attribute field stores, in each of thephysical servers or the logical servers, an attribute 743 which is anidentifier indicating that the I/O card 800 is shared, exclusively used,or prohibited from being accessed by the server.

The example of FIG. 4 shows that, with regard to the I/O card 800 whosephysical I/O card ID 741 is set to “1”, the I/O card type 742 is “FCHBA”. The I/O card 800 is shared by a server 1 and a server 2, but isprohibited from being accessed by a server N. In addition, FIG. 4 showsthat the I/O card is used by the server 1, and the I/O card is not usedby the server 2.

FIGS. 5 to 9 indicate a detailed content of each Tx sent/receivedto/from the I/O hubs 700.

FIG. 5 is an explanatory diagram of an example of the MM I/O writesystem Tx 1100.

The MM I/O write system Tx 1100 includes an access type field, adestination node field, an issuer node field, a Tx ID field, a virtualMM I/O address field, and a write data field.

The access type field stores an access type 1106 of the Tx. The accesstype 1106 is an identifier indicating that a destination of the accessis a memory or an I/O, that the access is a read or a write, and thelike. The MM I/O write system Tx 1100 stores the access type 1106indicating that the access is a write in the MM I/O area.

The destination node field and the issuer node field store a destinationnode 1101 and an issuer node 1102 of the Tx. The destination node 1101and the issuer node 1102 are used by the switch 600 for the routing. Inthe MM I/O write system Tx 1100 received by the I/O hub 700, thedestination node 1101 is an identifier indicating the I/O hub 700.

The Tx ID field stores a Tx ID 1103 that is an identifier by which theissuer node can uniquely identify each transaction.

The virtual MM I/O address field stores a virtual MM I/O address 1104.The virtual MM I/O address 1104 is a virtual address of the I/O card tobe accessed which is allocated to each requester guest of the MM I/Owrite system Tx 1100. It should be noted that a different register isusually allocated for each MM I/O address.

The write data field stores write data 1105 which is written in responseto an instruction of the MM I/O write system Tx 1100.

FIG. 6 is an explanatory diagram of an example of the MM I/O write PCITx 1100.

The MM I/O write PCI Tx 1100 includes an access type field, a Tx IDfield, a physical MM I/O address field, and a write data field.

The access type field stores an access type 1111 indicating that thetransaction is an MM I/O write PCI Tx, and that the transaction is awrite or a read.

The Tx ID 1112 stores a Tx ID 1112 that is an identifier by which theissuer node (in this case, the I/O hub 700) can uniquely identify eachtransaction.

The physical MM I/O address field stores a real MM I/O address 1113(i.e., physical MM I/O address 1113) on the computer system of the I/Ocard 800 to be accessed.

The write data field stores write data 1114 to be written in response toan instruction of the MM I/O write PCI Tx 1110.

FIG. 7 is an explanatory diagram of an example of the DMA request PCT Tx1120.

The DMA request PCT Tx 112 includes an access type field, a Tx ID field,a guest qualified DMA address field, and a write data field.

The access type field stores an access type 1121 indicating that thetransaction is the DMA request PCI Tx and the transaction is a write ora read.

The Tx ID field stores a Tx ID 1122 that is an identifier by which theissuer node (in this case, an I/O card or an I/O device) can uniquelyidentify each transaction.

The guest qualified DMA address field stores a guest qualified DMAaddress 1123 including a guest identifier and a DMA address. The guestqualified DMA address 1123 will be described with reference to FIG. 8.

The write data field stores write data 1124 to be written in response toan instruction of the DMA request PCI Tx only when the DMA request PCITx is a write transaction.

FIG. 8 is an explanatory diagram of the guest qualified DMA address1123.

A guest DMA address 1115 is an address of a main memory of a DMAtransfer destination or a DMA transfer source recognized by the guest.

An address space of the PCI transaction has a size of 64 bits, and isgenerally much larger than a space of a physical memory to be actuallymounted. Thus, the guest DMA address includes an unused address section1116 in a significant bit part, in addition to a used address section1117 which is actually used.

A guest identifier 1125 is buried in a part of or the whole of theunused address section 1116, thereby generating the guest qualified DMAaddress. Upon receiving the guest qualified DMA address, the requestdecoder 752 refers to the guest identifier 1125 buried in the guestqualified DMA address, to thereby identify the guest to be accessed.

FIG. 9 is an explanatory diagram of an example of the DMA request systemTx 1130.

The DMA request system Tx 1130 includes an access type field, adestination node field, an issuer node field, a Tx ID field, a host DMAaddress field, and a write data field.

The access type field stores an access type 1136 which indicates thatthe transaction is the DMA request system Tx whose access target is amemory, and that the transaction is a write or a read.

The destination node field stores a destination node 1311 which includesa memory associated with a destination of the Tx.

The issuer node field stores an issuer node 1132 of the Tx. It should benoted that in the DMA request system Tx 1130, the issuer node 1132 isthe I/O hub 700.

The Tx ID field stores a Tx ID 1133 that is an identifier by which theissuer can uniquely identify each transaction.

The host DMA address field stores a host DMA address 1134 which is aphysical address of a memory for actually executing DMA.

The write data field stores write data 1135 when the Tx is a writetransaction.

Next, an operation of the I/O card sharing module 750 will be described.

FIG. 10 is an explanatory diagram of the outline of initializationprocessing of the MM I/O initializer 760.

The MM I/O initializer 760 includes an MM I/O area allocation table1230. The MM I/O area allocation table 1230 is initialized based on theI/O card sharing settings 740.

Upon receiving the I/O count up request 1140 from the guest, the MM I/Oinitializer 760 determines whether or not physical initializationprocessing for the I/O card 800 is required. In a case where thephysical initialization processing for the I/O card 800 is required, theMM I/O initialize 760 issues the I/O count up request 1141 to the targetI/O card.

Further, the MM I/O initializer 760 refers to the MM I/O area allocationtable 1230 to determine the virtual MM I/O address to be allocated tothe requester guest. Then, the MM I/O initializer 760 registers thedetermined virtual MM I/O address in the MM I/O address conversion table720 which is contained in the MM I/O write decoder 751, and returns thevirtual MM I/O address to the requester guest as the I/O count upresponse 1143.

FIG. 11 is an explanatory diagram of an example of the MM I/O areaallocation table 1230.

The MM I/O area allocation table 1230 is composed of one or more entrieswhich include a physical card ID field, a starting MM I/O address field,an address range field, a maximum sharing guest count field, and a usestate field.

The physical card ID field stores a physical I/O card ID 741. Thephysical I/O card ID 741 has the same value as that of the physical cardID 741 of the I/O card sharing settings 740.

The starting MM I/O address field stores a starting MM I/O address 1231which is an initial address of the MM I/O address of the I/O addresscard. The address range field stores an address range 1233 of the MM I/Oarea used by the I/O card.

The maximum sharing guest count field stores a maximum guest sharingcount 1234 by which the I/O card can be shared.

The use state field stores a use state 1235 of the I/O card. The usestate 1235 is represented as bitmaps for each guest which shares the I/Ocard. In a case of an ordinary use, the bits are set to 1, and in a caseof an unused state, the bits are set to 0, respectively.

FIG. 12 is an explanatory diagram of an example of the MM I/O addressconversion table 720.

The MM I/O address conversion table 720 is composed of one or moreentries which include a virtual MM I/O address field, a physical MM I/Oaddress field, a guest identifier field, and an I/O P memory addressfield.

The virtual MM I/O address field stores the virtual MM I/O address 1104allocated to the guest. The physical MM I/O address field stores thephysical MM I/O address 1113 associated with the virtual MM I/O address1104. The guest identifier field stores the guest identifier 1125 of theguest which uses the virtual MM I/O address. The I/O P memory addressfield stores an I/O P memory address 1200 of the I/O P memory 712allocated to the guest.

The I/O card sharing module 750 refers to the MM I/O address conversiontable 720, to thereby mutually convert the virtual MM I/O address 1104,the physical MM I/O address 1113, the guest identifier 1125, and the I/OP memory address 1200.

Next, an operation of initialization processing of the MM I/Oinitializer 760 will be described.

FIGS. 13 to 15 are flowcharts of the initialization processing of the MMI/O initializer 760.

First, the MM I/O initializer 760 receives the I/O count up request 1140via the switch 600 (Step S1500).

The I/O count up request 1140 is normally sent as a read request fromthe guest to a PCI configuration space.

To be specific, the guest allocates, to the PCI configuration space, abus number, a device number, and a function number. In response to this,the MM I/O initializer 760 returns whether or not the correspondingdevice is present. As described later, in a case where the I/O card 800corresponding to the device number is present, the MM I/O initializer760 returns a base address and a size (i.e., area) of the MM I/O area ofthe corresponding device to the guest. In a case where the devicecorresponding to the device number is not present, the MM I/Oinitializer 760 returns a value (i.e., master abort) with bits all setto 1.

Next, the MM I/O initializer 760 determines whether or not the physicalinitialization processing is necessary (Step S1510). It should be notedthat the physical initialization processing is executed only once foreach physical I/O device, for example, immediately after the physicalI/O device is reset or after the I/O configuration is changed. In a casewhere the physical initialization processing is not necessary, theprocess proceeds to Step S1520. In a case where the physicalinitialization processing is necessary, the process proceeds to StepS1610. It should be noted that processing of Sub-step S1610 will bedescribed in detail later with reference to FIG. 15.

In Step S1520, the MM I/O initializer 760 determines whether or not therequester guest is accessible to the target I/O card. The MM I/Oinitializer 760 refers to the I/O card sharing settings 740 to obtainthe I/O card sharing attribute of the requester guest associated withthe target I/O card. When the I/O card sharing attribute is“prohibited”, the requester guest is not permitted to access the targetI/O. In a case where the requester guest is prohibited from accessingthe target I/O, the process proceeds to Step S1530. In a case where therequester guest is permitted to access the target I/O, the processproceeds to Step S1540.

In Step S1530, the MM I/O initializer 760 sends, to the requester guest,the I/O count up response 1143 (i.e., master abort) indicating that thetarget I/O card is not present.

In Step S1540, the MM I/O initializer 760 determines whether or not thetarget I/O card can be shared. When the MM I/O initializer 760 indicatesthat the I/O card sharing attribute of the requester guest can be sharedand that the I/O card is unused, the process proceeds to Sub-step S1560.In a case where the I/O card sharing attribute of the requester cannotbe shared, the process proceeds to Step S1550.

In Step S1550, the MM I/O initializer 760 returns information of thephysical I/O card to the I/O card of the requester which cannot beshared. The MM I/O initializer 760 obtains the starting MM I/O addressand the address range of the I/O card from the MM I/O area allocationtable 1230, and sends the starting MM I/O address and the address rangeto the requester guest as the I/O count up response 1143.

In Sub-step 1560, the virtual MM I/O address is allocated to the I/Ocard that can be shared. It should be noted that the processing of theSub-step S1560 will be described in detail below with reference to FIG.14.

FIG. 14 is a flowchart showing a virtual MM I/O assignment process ofSub-step S1560 shown in FIG. 13.

First, the MM I/O initializer 760 refers to the MM I/O area allocationtable 1230 to obtain the use state 1235 of the I/O card 800. Then, theMM I/O initializer 760 searches for an unused index (i.e., bit) from theuse state 1235. After the unused index is found, the area of the usestate is changed into an area being used (Step S1570).

Next, the MM I/O initializer 760 refers to the MM I/O area allocationtable 1230 to obtain a starting MM I/O address 1232 of the target I/Ocard. Then, based on the starting MM I/O address 1232 thus obtained, theMM I/O initializer 760 calculates the virtual MM I/O address 1104 (StepS1580).

To be specific, the address range 1233 is multiplied by a value obtainedby adding 1 to an offset of an unused area, and the calculated value isadded to the obtained starting MM I/O address 1232, to thereby obtainthe virtual MM I/O address 1104. In other words, the virtual MM I/Oaddress 1104 is calculated by the following calculating formula.“virtual MM I/O address”=“starting MM I/O address”+(“addressrange”×(“offset of unused capacity”+1))

By using the formula, the virtual MM I/O address is obtained.

Next, the MM I/O initializer 760 registers the virtual MM I/O address1104 calculated in Step S1580 in the MM I/O address conversion table720. The MM I/O initializer 760 creates a new entry in the MM I/Oaddress conversion table 720. Then, the MM I/O initializer 760 registersthe obtained virtual MM I/O address 1104 in the virtual MM I/O addressfield. Further, the MM I/O initializer 760 registers the obtainedphysical MM I/O address 1113 of the I/O address in the virtual MM I/Oaddress field. In addition, MM I/O initializer 760 registers the guestidentifier 1125 of the requester guest in the guest identifier field.Further, MM I/O initializer 760 secures an area associated with theaddress range 1233 in the I/O P memory 712, and registers the address tothe I/O P memory address 1200 in the I/O P memory address field (StepS1590).

As a result, from then on, MM I/O initializer 760 can convert thevirtual MM I/O address 1104 into the physical MM I/O address 1113 andextract the guest identifier 1125 by referring to the MM I/O addressconversion table 720 of the MM I/O write decoder 751.

Next, the MM I/O initializer 760 sends the virtual MM I/O address 1104obtained in Step S1580 and the address range 1233 to the requester guestas the I/O count up response 1143 (Step S1600). After that, the virtualMM I/O address assignment process is completed.

FIG. 15 is a flowchart showing the physical initialization processing ofSub-step S1610 shown in FIG. 14.

First, the MM I/O initializer 760 issues the I/O count up request 1141to the I/O bus 850 (Step S1620).

Next, the MM I/O initializer 760 receives the I/O bus enumerationresponse 1142 from the I/O bus 850 (Step S1630).

The MM I/O initializer 760 determines whether or not the I/O card ispresent based on the I/O bus enumeration response 1142. In a case wherethe I/O card is not present, the process proceeds to Step S1650. In acase where the I/O card is present, the process proceeds to Step S1660.

In Step S1650, the MM I/O initializer 760 sends I/O count up response1143 (i.e., master abort) indicating that the I/O is not present to therequester guest. Then, the physical initialization processing iscompleted.

In Step S1660, the MM I/O initializer 760 refers to the I/O card sharingsettings 740 to obtain the maximum sharing guest count of the I/O card.

Next, the MM I/O initializer 760 determines whether or not the maximumsharing guest count is 1 (Step S1670). In a case where the maximumsharing guest count is 1, the process proceeds to Step S1680. In a casewhere the maximum sharing guest count is 2 or larger, the processproceeds to Step S1690.

When the maximum sharing guest count is 1, the I/O card is not shared.In other words, only the requester guest can exclusively use the I/Ocard. Thus, in Step S1680, the MM I/O initializer 760 sends theinformation included in the I/O count up response 1142 received in StepS1630 as it is to the requester guest as the I/O count up response 1143.Then, the physical initialization processing is completed.

In Step S1690, the MM I/O initializer 760 registers a new entry in theMM I/O area allocation table 1230. In the entry, the starting MM I/Oaddress 1232 and the address range 1233, which are included in the andthe I/O count up response 1142, and the maximum sharing guest count 1234obtained in Step S1660 are registered. Then, the use state 1235 isinitialized to a bitmap indicating that all the I/O cards are unused.

Next, the MM I/O initializer 760 sends the I/O count up response 1143(Step S1700). The I/O count up response 1143 includes the MM I/Ostarting address 1232 and the address area 1233. The address area 1233is a value obtained by multiplying the address range 1233 by a valueobtained by adding 1 to the maximum sharing guest count 1234. In otherwords, the address ranger 1233 is calculated by the followingcalculating formula.“address area”=“address range”×(maximum sharing guest count+1)

Then, the physical initialization processing is completed.

By the above-mentioned process, the virtual MM I/O area for the maximumsharing guest count is reserved immediately after the physical MM I/Oarea.

FIG. 16 is an explanatory diagram of a relationship between the virtualMM I/O address and the physical MM I/O address.

It should be noted that FIG. 16 shows a case where the MM I/O startingaddress 1232 is set to “A” and the address range 1233 is set to “R”.

In this case, the starting address of the physical MM I/O address 1113is set to “A”, so a virtual MM I/O address 1104-B allocated to a guest 1is set to (A+R) which is an address obtained by adding R to the startingaddress A. In addition, a virtual MM I/O address 1104-C allocated to aguest 2 is set to (A+2R) which is a value obtained by further adding Rto the value (A+R).

Thus, the virtual MM I/O address is allocated to each address range, andthe respective virtual MM I/O addresses used by the respective guestsare mapped without being overlapped with each other. Therefore, thevirtual MM I/O addresses are allocated to the respective guests suchthat the virtual MM I/O address becomes a unique address for each guest.

FIG. 17 is an explanatory diagram of an address map of the I/O P memory712.

It should be noted that FIG. 17 shows a case where an I/O P memoryaddress 1200-P is allocated to the guest 1 and an I/O P memory address1200-Q is allocated to the guest 2.

These addresses are secured in the I/O P memory 712 such that therespective areas are not overlapped with each other. The I/O P memory712 is used to temporarily hold the actual write in the MM I/O area.

Next, an operation at the time of the MM I/O write request from theguest.

FIG. 18 is an explanatory diagram of the MM I/O write process.

It should be noted that FIG. 18 shows an operation in a case where,after the above-mentioned setting for initialization processing shown inFIGS. 13 to 15 is completed, an access request with respect to the I/Ocard 800 is made by the guest.

A control register of the I/O card 800 is mapped as the MM I/O area inthe memory space. Thus, the access request with respect to the I/O card800 made by the guest is executed as write in the MM I/O area.

Write in the MM I/O area is issued as the MM I/O write system Tx 1100via the switch 600. The MM I/O write system Tx 1100 is trapped by the MMI/O write decoder 751 including the target I/O card 800.

Upon receiving the MM I/O write system Tx 1100, the MM I/O write decoder751 refers to contents of the MM I/O system Tx 1100, and determineswhich register of the I/O card 800 to be accessed the access requestfrom the guest is made for. In this case, the MM I/O write decoder 751refers to the MM I/O address conversion table 720 to obtain the physicalMM I/O address 1113, the guest identifier 1125, and the I/O P memoryaddress 1200.

It should be noted that in a case where the access request from the hostis made for a command register (CMD REG), the MM I/O write decoder 751sends an interruption to the I/O P CPU 711 of the I/O processor 710. Ina case where the access request is not a command register (CMD REG), theMM I/O write decoder 751 writes data in an area which is based on theI/O P memory address 100 receiving the MM I/O write system Tx 1100 inthe I/O P memory 721.

The I/O P CPU 711 having received the interruption reads out an areastarting from the corresponding I/O P address 1200, and copies the areain the physical MM I/O address 1113 on the physical I/O card 800. Inthis case, in order to copy the area associated with the addressregister, the MM I/O write decoder 751 buries the guest identifier 1125in a significant unused address section 1116 of the guest DMA address1115 to generate the guest qualified DMA address 1123. Further, the MMI/O write decoder 751 generates the MM I/O write PCI Tx 1110 includingthe generated guest qualified DMA address 1123, and issues the MM I/Owrite PCI Tx 1110 to the I/O bus 850.

By the above-mentioned operation, even when a plurality of guests accessthe same I/O card 800 at the same time, the actual access to the I/Ocard 800 is arbitrated by the I/O processor 710. In addition, since theguest identifier 1125 for identifying each guest is buried in the guestDMA address, even when the DMA request reaches from the I/O bus 850afterward, it becomes possible to identify each requester guest.

FIG. 19 is an explanatory diagram of processing of the DMA requestdecoder 752.

The DMA request decoder 752 includes a DMA address conversion table 730and an address decoder 753.

Upon receiving the DMA request PCI Tx 1120 from the I/O bus 850, the DMArequest decoder 752 first extracts the guest DMA address 1115 and theguest identifier 1125 from the DMA request PCI Tx 1120.

Next, the DMA request decoder 752 refers to the DMA address conversiontable 730 to obtain a pointer 731 pointing to a conversion tablecontaining the guest DMA address 1115 and the host DMA address 1134 fromthe extracted guest identifier. Then, the DMA request decoder 752 refersto a table indicated by the pointer 731 obtained, to thereby obtain thehost DMA address 1134 associated with the guest DMA address 1115.

Further, the DMA request decoder 752 decodes the converted host DMAaddress 1134 using the address decoder 753 to obtain a destination node1131.

The DMA request decoder 752 includes the obtained destination node 1131and the host DMA address 1134 into the DMA request system Tx 1130, andissues the DMA request system Tx 1130 to the switch 600.

It should be noted that the pointer 731 of the DMA address conversiontable 730 is NULL, the DMA request decoder 752 provides the guest DMAaddress 1115 as the host DMA address 1134 without decoding.

By the above-mentioned series of operations, in the case where the I/Ocard is shared by a plurality of guests, the guest of the source of theaccess is identified, the identified guest identifier is buried in theguest DMA address, and the guest identifier is extracted from the DMArequest, thereby making it possible to appropriately convert the guestDMA address into the host DMA address, and execute the DMA transfer inwhich the host DMA address is directly transferred to the memory spaceof the guest.

FIG. 20 is an explanatory diagram of an example of the DMA addressconversion table 730.

As shown in FIG. 20, the DMA address conversion table 730 is constitutedof two tables which includes a table 7301 containing the guestidentifier 1125 and the pointer 731, and a table 7302 containing theguest DMA address 1115 and the host DMA address 1134.

The pointer 731 includes pointers indicating the guest DMA address 1115and the host DMA address 1134 which correspond to the guest identifierin the table 7302.

It should be noted that the DMA address conversion table 730 isgenerated by the hypervisor 500C by the use of the I/O card sharingsettings 740. It should be noted that in a case where the hypervisor500C is not provided and the guest is directly run on the physicalserver 300, it is unnecessary to convert the DMA address, so the pointer731 in association with the guest is set to NULL.

As described above, in the computer system according to the firstembodiment of this invention, the virtual MM I/O address associated withthe physical MM I/O address for each I/O card is allocated to eachguest, thereby making it possible to share an I/O card among a pluralityof physical servers and logical servers. As a result, it is possible toremove the limitation in the number of servers in the server integrationwhich is limited by the number of I/O card that can be mounted, andachieve more flexible server configuration and effective use of ahardware resource.

In particular, the I/O hub mutually converts the physical MM I/Oaddress, the virtual MM I/O address, and the guest identifier, so theI/O hub enables the DMA transfer from the I/O card while achieving thesharing of the I/O card. As a result, in a case where the I/O card 800is shared among the physical servers and the logical servers, it ispossible to suppress performance degradation.

Further, in a case of the I/O card connected via the I/O bridge, therequester guest can be identified and the DMA transfer can be executed.

FIRST MODIFIED EXAMPLE

Next, a first modified example according to the first embodiment will bedescribed.

As described above, according to the first embodiment, the MM I/O writedecoder 751 directly traps a write in the command register of the MM I/Oarea of the I/O card 800 and in the address register. However, in recentyears, along with an increase in the speed of the I/O card, a commandchain capable activating a plurality of commands by a single MM I/Oaccess has been employed in order to reduce the number of accesses tothe MM I/O area which require more time as compared with a main storageaccess.

In the first modified example, operations of the MM I/O write decoder751 and the I/O processor 710 when the I/O card 800 uses a commandchain.

FIG. 21 shows an explanatory diagram of a guest MM I/O area 1220 and aguest memory area 1300 in the command chain.

In the command chain, a part of a driver memory area 1305 in the guestmemory area 1300 includes a memory area for a command chain 1310. Acommand from a guest with respect to the I/O card 800 is written in amemory for the command chain 1310.

An access request from the host to the I/O card 800 is temporarilystored in the command chain 1310 of the guest memory 1300. Then, thereal access request to the I/O card 800 is started with write in acommand chain tail pointer 1330 of the MM I/O area. The I/O card 800transfers the command chain 1310 stored between an address pointed bythe command chain head pointer 1320 and an address pointed by thecommand chain tail pointer 1330 to a register on the I/O card 800. As aresult, commands instructed by the command chain are collectivelyexecuted.

When trapping write in the command chain tail pointer 1330, the MM I/Owrite decoder 751 issues an interruption to the I/O P CPU 711 of the I/Oprocessor. The I/O P CPU 711 having received the interruption copies thecommand chain 1310 in the I/O P memory 712. At this time, the guestidentifier 1125 is buried in the register associated with the DMAaddress so as to identify each guest when the DMA request is made. Uponcompletion of the copying from the guest memory area 1300 to the I/O Pmemory 712, the I/O P CPU 711 writes an end address of the command chainin the I/O P memory 712 with respect to the command chain tail pointerin the real physical MM I/O. Thus, the command of the I/O card 800 isstarted.

As described above, the computer system according to the first modifiedexample of the first embodiment of this invention may be applied to theI/O card 800 using the command chain.

SECOND MODIFIED EXAMPLE

Next, a second modified example according to the first embodiment willbe described.

In the first embodiment, the I/O processor 710 is composed of anindependent processor (i.e., I/O P CPU 711) and a memory (i.e., I/O Pmemory 712). In the second modified example, a part of each of theresource of the CPU 100 and the memory 120A with the processor and thememory included in a node is used. To be more specific, a part of theresource of the server may be allocated to the I/O processor as beingdedicated only for the I/O processing, or the CPU 100 may be time-sharedto be allocated to the I/O processing using the hypervisor 500C. Itshould be noted that a mode of using a part of the resource logicallypartitioned for the I/O processing is called I/O partition.

The MM I/O write decoder 751 includes, instead of a direct routing tothe I/O processor 710, a register for setting an ID of the CPU whichissues a base address of the memory for the I/O processing, a node ID,and an interruption.

In the case where the above-mentioned issuance of the interruption tothe I/O P CPU 711 by the I/O processor, and write in the I/O P memory712 are required, the MM I/O write decoder 751 generates the system Txaccording to the values of the register, and transfers the system Tx tothe node, which executes the I/O processing, via the switch 600.

Second Embodiment

Next, a second embodiment of this invention will be described.

A computer system according to the second embodiment includes a nodecontroller 1000 having a function of both the above-mentioned I/O hub700 and switch 600 of the first embodiment.

FIG. 22 is a configuration block diagram of the computer systemaccording to the second embodiment.

It should be noted that components identical with those according to thefirst embodiment are denoted by the identical reference symbols andexplanations thereof are omitted.

The node controller 1000 (node controller 1000A) includes a switch 600,an I/O hub 700, and a north bridge 110A which are consolidated as afunctional module.

The node controller 1000 is provided to a node 210A.

Similarly to the above-mentioned first embodiment, an I/O hub 700 isconnected to an I/O card 800 and an I/O bridge 810 via an I/O bus 850.As described above, the I/O hub 700 includes the I/O card sharing module750 and shares the I/O card 800. On the other hand, a switch 600 of onenode 210 is connected to another switch 600 of another node 210 via anode link 1010 to transfer a transaction to an arbitrary node in theentire computer system. The processing is executed in the same manner asin the above-mentioned first embodiment.

It should be noted that in the second embodiment, it is more preferableto use a part of each of the CPU 100 and the memory 120A included in thenode 210 for the I/O processing as in the first embodiment and thesecond modified example, rather than to include the I/O processor 710 inthe I/O hub 700. In this case, the CPU 100 and the memory 120A to beused may be provided in a node 210A which is identical with the I/O hub700 or may be provided in a node 210A which is different from the I/Ohub 700.

Third Embodiment

FIG. 23 is a block diagram showing a blade server system according to athird embodiment.

The blade server system includes: a plurality of server blades 10-1 to10-n; I/O cards 501 and 502 provided with I/O interfaces of varioustypes; a switch 250 for connecting the server blades 10-1 to 10-n to theI/O cards; an I/O card sharing module 450 for sharing the I/O cards 501and 502 among the plurality of the server blades 10-1 to 10-n; and anI/O processor blade 650 for managing the sharing of the I/O cards. Theserver blades, the switch 250, and the I/O card sharing module 450 arestored in a casing (not shown).

The server blades 10-1 to 10-n each include a CPU 101 and a memory 102which are connected together through a chip set (or an I/O bridge) 103.The chip set 103 is connected to the switch 250 through one of generalbuses 11-1 to 11-n. In this embodiment, PCI-EXPRESS (referred to asPCI-ex in the drawing) is adopted for the general buses 11-1 to 11-n,for example.

The CPU 101 provides servers #1 to #n by executing an OS or anapplication loaded on the memory 102. The CPU 101 obtains access to theI/O cards 501 and 502 from the chip set 103 through the switch 250 andthe I/O card sharing module 450.

The switch 250 includes a header processing unit 260 for adding headerinformation to packets sent and received between the server blades 10-1to 10-n and the I/O cards 501 and 502 and for transferring the packetsbased on the header information.

The header processing unit 260 adds header information to a packet(access request signal) sent from one of the server blades 10-1 to 10-nto one of the I/O cards 501 and 502, and transfers the packet to a node(I/O card) associated with the address included in the headerinformation. The header information defines address information(identifier) of each of the server blades 10-1 to 10-n as a requester,and address information of each of the I/O cards as a destination. Theheader processing unit 260 of the switch 250 transfers a packet(response signal), which has been sent from one of the I/O cards to oneof the server blades 10-1 to 10-n, to one of the server blades 10-1 to10-n associated with the address (server identifier) included in thepacket. Here, the packet transfer according to this embodiment is basedon a PCI transaction (PCI-Tx), because PCI-EXPRESS is adopted as ageneral bus.

The I/O card sharing module 450 is connected between the switch 250 andthe I/O cards 501 and 502, for sharing the I/O cards 501 and 502 amongthe plurality of server blades 10-1 to 10-n through the general buses301, 311, and 312. The I/O card sharing module 450 is connected to theI/O processor blade 650 through a general bus 401. The I/O card sharingmodule 450 manages an address conversion and a sharing state relating tothe sharing of the I/O cards, as described later. The I/O processorblade 650 has a console 5 connected thereto, through which anadministrator or the like sets the sharing state of the I/O cards 501and 502.

The I/O cards 501 and 502 each are provided with an interface such as aSCSI (or a SAS), a fibre channel (FC), or Ethernet (registered mark).The I/O cards 501 and 502 each are further provided with a direct memoryaccess (DMA) controller 513 for directly accessing each of the memories102 in the server blades 10-1 to 10-n. The I/O cards 501 and 502 eachare further provided with a base address register 511 for designating abase address of memory mapped I/O (MM I/O) of the memory 102 on any oneof the server blades 10-1 to 10-n making DMA through the DMA controller513, and with a command register 512 for designating an instruction tobe given to the I/O cards 501 and 502. The DMA controller 513 executesoperation that corresponds to the command written by the commandregister 512 with respect to the memory 102 whose address is written bythe base address register 511. The I/O cards 501 and 502 each include aregister (not shown) (such as a configuration register or a latencytimer register) conforming to the PCI standard.

Explained next is the I/O processor blade 650 including a CPU 602 and amemory 603 which are connected together through a chip set (or an I/Obridge) 601. The chip set 601 is connected to the I/O card sharingmodule 450 through the general bus 401. A predetermined control programis executed on the I/O processor blade 650 as described later, whichexecutes processing such as an address conversion in response to an I/Oaccess from any one of the server blades 10-1 to 10-n.

(I/O Card Sharing Module)

Next, a detailed explanation is given in the following of the I/O cardsharing module 450 according to this invention, with reference to theblock diagram of FIG. 24.

The I/O card sharing module 450, which is provided between the serverblades 10-1 to 10-n and the I/O cards 501 and 502, performs an addressconversion on an I/O access packet from any one of the server blades10-1 to 10-n, to thereby make it possible to share a single I/O cardwith the plurality of server blades 10-1 to 10-n. Here, the switch 250,the general buses, and the I/O cards 501 and 502 each conform toPCI-EXPRESS. Hereinafter, the I/O access packet is referred to as PCItransaction.

The I/O card sharing module 450 mainly has three functions as follows:

1) a function of writing PCI transaction, which is sent from the serverblades 10-1 to 10-n to the I/O cards 501 and 502, into the memory 603 ofthe I/O processor blade 650;

2) a function of issuing an interrupt request for the CPU 602 of the I/Oprocessor blade 650, based on a write request for the command registers512 of the I/O cards 501 and 502; and

3) a function of converting an address of the PCI transaction based onDMA from the I/O cards 501 and 502 to the server blades 10-1 to 10-n.

In FIG. 24, the I/O card sharing module 450 is composed almostexclusively of: a content addressable memory 410 for storing an addressinformation table 411 described later; a header information extractingunit 406 for separating header information from the PCI transaction sentfrom any one of the server blades 10-1 to 10-n; a first transactiondecoder 402 (referred to as Tx decoder 1 in the drawing) for analyzing amain body of the PCI transaction excluding the header information andsending instruction to the I/O processor blade 650; a second transactiondecoder 403 (referred to as Tx decoder 2 in the drawing) for analyzing asignal from the I/O processor blade 650 and sending instruction to theI/O cards 501 and 502; a third transaction decoder 404 (referred to asTx decoder 3 in the drawing) for analyzing the PCI transaction from theI/O cards 501 and 502 and correcting (converting) the destinationaddress if the transaction is based on the DMA transfer; an interruptiongenerating unit 407 for issuing an interruption to the CPU 602 of theI/O processor blade 650 based on the instruction provided by the firsttransaction decoder 402; a memory writing unit 408 for performingwriting to the memory 603 of the I/O processor blade 650 based on theinstruction provided by the first transaction decoder 402; a signalselecting unit 412 for selecting a signal to be outputted to the I/Oprocessor blade 650, based on the instruction provided by the firsttransaction decoder 402; and a signal selecting unit 413 for selecting asignal to be outputted to the servers #1 to #n sides (switch 250) basedon the instruction provided by the third transaction decoder 404.

Here, the PCI transaction is constituted of, as shown in FIG. 25, a PCItransaction main body 461 storing data such as a command, an order, or aserver address (MM I/O base address) 462 and header information 451storing routing information. The header information 451 includes adestination 452 at the head thereof, which is followed by a requester453 that has issued the PCI transaction. For example, as regards the PCItransaction from the server blades 10-1 to the I/O card 501, routinginformation (such as an address) to the I/O card 501 is defined as thedestination 451, routing information of the server blade 10-1 is definedas the requester 453, and the PCI transaction main body 461 definesaddress information on the I/O register of the server blade 10-1, inaddition to the command and the order, as the MM I/O base address 462.

In FIG. 24, reference numeral 301-1 denotes an outbound PCI transactionfrom the switch 250 to the I/O card sharing module 450 through thegeneral bus 301 connecting the switch 250 (server blades 10-1 to 10-nside) and the I/O card sharing module 450. Reference numeral 301-2denotes an inbound transaction from the I/O card sharing module 450 tothe switch 250 (server blades 10-1 to 10-n side) through the general bus301. Similarly, reference numeral 401-2 denotes an outbound instructionsignal from the I/O card sharing module 450 to the I/O processor blade650 through the general bus 401 connecting the I/O card sharing module450 and the I/O processor blade 650, and reference numeral 401-1 denotesan inbound instruction signal (or the PCI transaction) from the I/Oprocessor blade 650 to the I/O card sharing module 450 through thegeneral bus 401. Further, reference numerals 311-1 and 312-1 each denoteinbound PCI transactions from each of the I/O cards 501 and 502 to theI/O card sharing module 450 through one of the general buses 311 and 312each connecting the I/O card sharing module 450 and the I/O cards 501and 502, respectively. Also, reference numerals 311-2 and 312-2 eachdenote outbound PCI transactions from the I/O card sharing module 450 toeach of the I/O cards 501 and 502 through one of the general buses 311and 312, respectively.

In the I/O card sharing module 450, upon receiving the outbound PCItransaction 301-1 from the switch 250 (server side), the headerinformation extracting unit 406 separates the PCI transaction into theheader information 451 and the PCI transaction main body 461 as shown inFIG. 25. The header information extracting unit 406 further extracts anoffset from the MM I/O base address defined in the PCI transaction. Theheader information extracting unit 406 then inputs the headerinformation 451 and the offset into the contact addressable memory 410,and also inputs the PCI transaction main body 461 to the firsttransaction decoder 402.

The content addressable memory 410 includes a CAM (contents addressablememory), and holds the address information table 411 defined by the I/Oprocessor blade 650. The address information table 411 stores accesspermission information (allocation information) on each of the servers#1 to #n with respect to the I/O cards 501 and 502 connected to the I/Ocard sharing module 450, as described later.

Then, the content addressable memory 410 inputs an address to be found(header information 451) as a search key (bit string), and outputs anaddress associated with the search key thus inputted from a preset table(address information table 411). As described later, the contentaddressable memory 410 refers to the header information 451 inputted,and outputs the base address of MM I/O and an address of the memory 603on the I/O processor blade 650 both associated with the headerinformation 451.

The first transaction decoder 402 refers to the PCI transaction mainbody 461 received from the header information extracting unit 406 andthe MM I/O base address received from the content addressable memory 410so as to analyze an instruction of the PCI transaction main body 461, tothereby select an outbound instruction signal 401-2 to be outputted tothe I/O processor blade 650. When the instruction of the PCI transactionmain body 461 is not a predetermined instruction, the transactiondecoder 402 transfers the PCI transaction received by the I/O cardsharing module 450 to one of the I/O cards 501 and 502 as the outboundPCI transaction without making any modification thereto.

Explained next is a memory space in a case where the plurality of serverblades 10-1 to 10-n (servers #1 to #n) share a single I/O card. In thefollowing embodiment, three servers #1 to #3 share a single I/O card501.

The servers #1 to #3 each set an I/O area to the memory 102 of each ofthe servers for one I/O card, and the I/O area is associated with the MMI/O address space. For example, as shown in FIG. 26, for each I/O cardto be used (or shared) (in this case, the I/O card 501), the servers #1to #3 (server blades 10-1 to 3) each have the MM I/O area with MM I/Obase address of 0xA, 0xB, or 0xC and an offset, which indicates a sizeof the memory space, of 0xS, 0xY, or 0xZ, respectively defined thereto.Those MM I/O base addresses and the offsets are determined by a BIOS orthe OS activated in the server blades 10-1 to 10-n.

In correspondence with the MM I/O of each of the servers #1 to #3, thememory 603 of the processor blade 650 includes the memory spaces 6031 to6033 set thereto, as shown in FIG. 27, for the I/O card 501 which is tobe shared by the servers #1 to #3, as described later. In FIG. 27, thememory space 6031 (0xP) is set to the memory 603 of the I/O processorblade 650 so as to be associated with the MM I/O base address 0xA of theserver #1. It should be noted that the I/O processor blade 650 sets amemory space on the memory 603 only for the MM I/O of the server whichshares the target I/O card for sharing, based on an I/O card sharingsettings table 610 (see FIG. 29) set in the memory 603. Similarly, thememory space 6032 (0xQ) is set so as to be associated with the MM I/Obase address 0xB of the server #2 which shares the I/O card 501, and thememory space 6033 (0xR) is set so as to be associated with the MM I/Obase address 0xC of the server #3.

Then, the I/O areas associated with the MM I/O base addresses of theservers #1 to #3, that is, 0xA, 0xB, and 0xC, share the I/O card 501.Accordingly, the address information table 411 of the contentaddressable memory 410 is set as shown in FIG. 28 by the I/O processorblade 650.

The address information table 411 of FIG. 28 includes: a header 4111which is to be compared with the header information 451 of the PCItransaction received by the I/O card sharing module 450 from any one ofthe servers #1 to #3; an MM I/O base address 4112 which is to beoutputted when the header information 451 has matched the header 4111 inthe address information table 411; and an address (referred to as IoPADDR in the drawing) 4113 of the memory space in the I/O processor blade650, which is outputted when the header information 451 has matched theheader 4111 of the address information table 411; and an offset 4114which is to be compared with the offset inputted to the contentaddressable memory 410. The header 4111, the MM I/O base address 4112,the address 4113, and the offset 4114 are preset.

As described above, in the case where the servers #1 to #3 share the I/Ocard 501, the address information of the servers #1 to #3 is set to theheader 4111 of the address information table 411 of FIG. 28, while theMM I/O base addresses shown in FIG. 26 regarding the server #1 to #3 areset to the MM I/O base address 4112. In the memory space address 4113,there is set the address information of the memory spaces 6031 to 6033so as to be associated with the MM I/O base address of the servers #1 to#3 as shown in FIG. 27. In the offset 4114, a difference from the MM I/Obase address is set so as to obtain the sizes of the memory spaces ofthe servers #1 to #3.

In the header 4111, the destination 452 of the header information 451 isset in such a manner that the “Io1” indicating address information ofthe I/O card 501 forms a pair with each of “SV1” to “SV3” indicatingaddress information of the servers #1 to #3 of the requester 453 whichhas requested the I/O access, and each pair is set as an address forcomparison.

For example, the PCI transaction from the server #1 (server blade 10-1)to the I/O card 501 includes the destination 452 of the headerinformation 451 having “Io1” set thereto, and the requester 453 having“SV1”, which is the address information of the server #1, set thereto.The header information 451 extracted by the header informationextracting unit 406 is inputted to the address information table 411 ofthe content addressable memory 410, so that the content addressablememory 410 outputs the MM I/O base address 0xA of the server #1 and theaddress=0xP of the memory space 6031 of the I/O processor blade 650 forsharing the I/O card 501 by the server #1.

The MM I/O base address outputted from the content addressable memory410 is inputted to the first transaction decoder 402, and the address ofthe memory space 6031 is inputted to the signal selecting unit 412.

The following explanation is given on the operation of the transactiondecoders 402 to 404 with the above-mentioned memory space.

The first transaction decoder 402 extracts an instruction regarding theI/O cards 501 and 502 from the PCI transaction main body 461 receivedfrom the header information extracting unit 406. Based on the contentsof the instruction, the I/O card sharing module 450 makes a decision onwhat signal to output to the I/O processor blade 650 or to the I/O cards501 and 502 as follows.

A) In the case where the instruction extracted from the PCI transactionmain body 461 is a write instruction to the command register 512 of eachof the I/O cards 501 and 502 (instruction to start the operation of theI/O card, for example, a DMA transfer starting command), the transactiondecoder 402 provides instructions to the interruption generating unit407 to output an interruption, and also to the signal selecting unit 412to select the output from the interruption generating unit 407 to outputan outbound instruction signal 401-2 of the general bus 401 to the I/Oprocessor blade 650. This interruption signal includes the address 4113of the I/O processor blade 650 outputted by the content addressablememory 410. In executing the interruption, the header information 451and the offset each inputted to the content addressable memory 410 needto match the header 4111 and the offset 4114 of the address informationtable 4111, respectively.

B) In the case where the instruction extracted from the PCI transactionmain body 461 is a write instruction to a register (for example, thebase address register 511) other than the command register 512 of eachof the I/O cards 501 and 502 (instruction not to start the operation ofthe I/O card, such as a DMA initialization request), the transactiondecoder 402 provides instructions to the memory writing unit 408 towrite the PCI transaction (including the header information 451 and thePCI transaction main body 461) to a predetermined memory space of theI/O processor blade 650, and also provides instructions to the signalselecting unit 412 to select the output from the memory writing unit 408to output an outbound instruction signal 401-2 of the general bus 401 tothe I/O processor blade 650. The memory writing unit 408 writes theheader information 451 and the PCI transaction main body 461 withrespect to the address 4113 of the I/O processor blade 650 outputtedfrom the content addressable memory 410.

C) In the case where the instruction extracted from the PCI transactionmain body 461 is an instruction other than a write request to theregister of each of the I/O cards 501 and 502, the transaction decoder402 outputs the PCI transaction, which has been received from a signalline 420, as an outbound PCI transactions 311-2 and 312-2 without makingany modification thereto, through the general buses 311 and 312connecting the I/O card sharing module 450 and the I/O cards 501 and502. In this case, the transaction decoder 402 refers to the destination452 of the header information 451 of the PCI transaction so as to selectone of the general buses 311 and 312 each respectively connecting to theI/O cards 501 and 502, depending on which one of the I/O cards 501 and502 is associated with the destination 452.

In either of the above cases of A) to C), the transaction decoder 402refers to an I/O card sharing settings 405 described later, andprohibits access when the server of the requester does not have thedestination I/O card allocated. Also, when the transaction decoder 402refers to the I/O card sharing settings 405 described later to find thatan allocation state (attribute) is in an “EXCLUSIVE” state indicating anexclusive use of the I/O card, the transaction decoder 402 transfers thePCI transaction as it is to the destination I/O card according to thefunction described in the above C), without performing address decodingby the content addressable memory 410, to thereby allow normal access tobe made where the servers #1 to #n and the I/O cards 501 and 502directly execute the I/O processing.

As described above, the transaction decoder 402 of the I/O card sharingmodule 450, which is provided between the servers #1 to #n and the I/Ocards 501 and 502, converts the writing in the register of each of theI/O cards 501 and 502 into the operation for the I/O processor blade 650(write processing or an interruption processing with respect to thememory space), thereby making it possible to share the I/O cards 501 and502 each having no sharing function. For example, when the servers #1 to#n issue a DMA transfer request (DMA initialization request) to the I/Ocards 501 and 502, the transaction decoder 402 writes MM I/O baseaddress, to which the I/O card sharing module 450 performs the DMAtransfer, in the memory space of the I/O processor blade 650, accordingto the function described in the above B). Next, when the servers #1 to#n provides instructions to start the DMA transfer, the transactiondecoder 402 interrupts the I/O processor blade 650 according to thefunction described in the above A) and writes in the command register512 and the base address register 511 of either of the I/O cards 501 and502, which is caused to make DMA by the CPU 602 of the I/O processorblade 650 in place of the servers #1 to #n. Then, one of the I/O cards501 and 502 performs a DMA transfer to the requester servers #1 to #n,following an instruction given by the I/O processor blade 650 which hasissued an I/O access request in place of the servers #1 to #n. It shouldbe noted that the detailed operation of the I/O processor blade 650 willbe described later. Further, upon receiving a request from the serverblades 10-1 to 10-n to refer a configuration register (not shown) of theI/O card for initialization at startup, the transaction decoder 402interrupts the CPU 602 of the I/O processor blade 650, and writes thePCI transaction in the memory 603.

Described next is a main function of the second transaction decoder 403of the I/O card sharing module 450. That is, the second transactiondecoder 403 performs filtering such that an inbound instruction signal401-1 received from the I/O processor blade 650 through the general bus401 is outputted exclusively to the I/O card 501 or to the I/O card 502which is shared.

Accordingly, the I/O card sharing module 450 includes a register 430 forstoring the I/O card sharing settings 405 as an area for referring anI/O card sharing settings table 610 (see FIG. 29) provided on the memory603 of the I/O processor blade 650.

Here, as shown in FIG. 29, the I/O card sharing settings table 610includes attributes for indicating each server allocated (available) foreach I/O card, which is set by the administrator through the console 5or the like. The table is composed of an identifier 611 includingaddress information (such as a device number) of an I/O card, a type 612indicating a function of the I/O card, and allocation states 613 to 615of the servers #1 to #3.

FIG. 29 shows a relationship between (attributes of) the servers #1 to#3 and the I/O cards 501 and 502, in which an I/O card 1 under theidentifier 611 corresponds to the I/O card 501, and the type 612indicates SCSI card with the attribute of “SHARED” indicating that theI/O card 1 is shared by the servers #1, #2, and #3. The number of theallocation states 613 to 615 changes in accordance with the number ofthe servers operating.

The I/O card 2 under the identifier 611 corresponds to the I/O card 502of NIC card as indicated by the type 612. The attribute “EXCLUSIVE”indicates that the I/O card 2 is not shared by the other servers #1 and#3. The number of the allocation states 613 to 615 changes in accordancewith the number of the servers operating. The access from the servers #1and #3 to the I/O card 2 is prohibited because the I/O card 2 is notallocated (available) to the servers #1 and #3.

Upon receiving the PCI transaction from the I/O processor blade 650, thetransaction decoder 403 extracts a server address (MM I/O base address)from the PCI transaction main body 461 to compare the address with theMM I/O base addresses 4112 in the address information table 411, and ifthe address match any one of the MM I/O base addresses, obtains thedestination and the requester from the header 4111 of the entry of theMM I/O address. Next, the transaction decoder 403 compares thedestination thus obtained with the identifier of the I/O card sharingsettings 405, to thereby search for the server which is identical to therequester obtained from the matching entry. When the server thussearched for has the I/O card associated with the destination allocatedthereto, which justifies the PCI transaction received by the transactiondecoder 403, the transaction decoder 403 outputs the PCI transaction tothe I/O card as the destination. On the other hand, when thecorresponding server does not have the I/O card associated with thedestination allocated thereto, which means that the I/O access requestis unjustifiable, the transaction decoder 403 discards the PCItransaction. The I/O card sharing module 450 may also notify the serverof the error after the PCI transaction is discarded.

Described next is a main function of the third transaction decoder 404of the I/O card sharing module 450. That is, the third transactiondecoder 404 converts the header information 451 and a server address ofthe PCI transaction main body 461 so as to return the outbound PCItransactions 311-1 and 312-1, which have been received from the I/O card501 and 502 through the general bus 311 and 312, to the servers #1 to #neach being requester of the I/O access.

The transaction decoder 404 determines whether the PCI transactions311-1 and 312-1, which have been received from the I/O card side,requires an address conversion (such as DMA) or not (for example, anevent such as an interruption), and selects any one of the output of thetransaction decoder 404 and the inbound transaction 311-1 and 312-1, byusing the signal selecting unit 413.

When the PCI transactions 311-1 and 312-1 received from the I/O cardside is a DMA transfer, the transaction decoder 404 determines that theaddress conversion is necessary and instructs the signal selecting unit413 to select an output from the transaction decoder 404. On the otherhand, when the PCI transactions do not require the address conversion,the transaction decoder instructs the signal selecting unit 413 tooutput the received PCI transactions 311-1 and 312-1 without making anymodification thereto.

The transaction decoder 404 makes a determination as to whether theaddress conversion is necessary or not, depending on whether the PCItransaction main body 461 shown in FIG. 25 includes identifiers (addressinformation, etc.) of the servers #1 to #n at predetermined significantbits set in an unused area of the MM I/O base address 462, as describedlater. Specifically, the transaction decoder 404 determines that theaddress conversion is necessary when the PCI transaction main body 461includes identifiers of the servers #1 to #n (hereinafter, referred toas “server identifier”) at the significant bits of the MM I/O baseaddress 462. When the PCI transaction main body 461 includes no serveridentifier, the transaction decoder 404 determines that the addressconversion is unnecessary.

(I/O Processor Blade)

Next, the function of the I/O processor blade 650 is explained in thefollowing. FIG. 30 is a functional block diagram mainly showing the I/Oprocessor blade 650.

In FIG. 30, the memory 603 of the I/O processor blade 650 stores the I/Ocard sharing settings table 610 of FIG. 29 and the memory spaces 6031and 6032 (represented by 603 x in the drawing) of the I/O cards sharedby the plurality of servers. The memory 603 further includes aninterruption processing unit 620 loaded by a ROM or the like (notshown), which is activated upon an interruption (denoted by INT in thedrawing) from the I/O card sharing module 450. In activating the serverblades 10-1 to 10-n, an initialization processing unit 630 is loadedonto the memory 603 by an ROM or the like (not shown) upon aninterruption from the I/O card sharing module 450.

The I/O card sharing settings table 610 is appropriately set by theadministrator or the like through the console 5 connected to the I/Oprocessor blade 650, as described above, by which the allocation betweenthe I/O cards and the servers #1 to #n is defined. The memory space 603x of the memory 603 is set by the CPU 602 upon activation of the servers#1 to #n as described later. When the PCI transaction received from theswitch 250 corresponds to the above B), for example, when the PCItransaction includes a write command (DMA initialization request) to thebase address register 511 of the I/O card, the I/O card sharing module450 writes the PCI transaction main body 461 and the header information451 into the memory space 603 x which corresponds to the I/O card to beaccessed and the requester server.

After that, when the PCI transaction received from the switch 250corresponds to the above A) (writing to the command register 512), theI/O card sharing module 450 interrupts the CPU 602 of the I/O processorblade 650 so as to activate the interruption processing unit 620.

The interruption processing unit 620 writes the header information 451and PCI transaction main body 461 which are written in advance into thememory space 603 x, based on the address of the memory space 603 xincluded in the interruption instruction provided by the I/O cardsharing module 450. When the PCI transaction main body 461 includes aDMA transfer command, the interruption processing unit 620 temporarilyconverts the header information 451 and the MM I/O base address 462included the PCI transaction main body 461 as described later, to writethe MM I/O base address thus converted, into the address register 511 ofthe I/O card to be activated.

Next, the interruption processing unit 620 writes an instruction (forexample, a DMA transfer starting instruction) included in the PCItransaction main body 461 which has been interrupted, into the commandregister 512 of the I/O card which corresponds to the destination 452 ofthe header information 451, to thereby activate the operation of the I/Ocard.

Next, the following explanation is given regarding the addressconversion described above to be performed by the interruptionprocessing unit 620 in the case of the DMA transfer.

As shown in FIG. 25, according to PCI-EXPRESS or PCI, 64 bits (0 to 63bit in the drawing) are defined as the MM I/O address space of the PCItransaction. Also, as regards the CPU 101 of each of the servers #1 to#n (server blades 10-1 to 10-n), a CPU capable of addressing with 64bits is becoming increasingly common. It is not realistic, however, toprovide the server blades 10-1 to 10-n each with the memory 102 thatfully occupies the address space of 64 bits, and therefore, the memoryspace with several tens of GB at maximum is usually provided underpresent circumstances. Accordingly, a predetermined value of less than64 bits, for example, 52 bits (0 to 51 bit), is defined as a used areashown in FIG. 25 for the address bus of the CPU 101 or the like.

Therefore, while the MM I/O address space has 64 bits defined thereto,the significant bits of the address space are unused when the memory isinstalled in each of the server blades 10-1 to 10-n.

As described above, in FIG. 25, the significant bits of 52 to 63 bitsconstitute an unused area 463 in the MM I/O base address 462 included inthe PCI transaction main body 461 when 52 of the less significant bitsare defined as an accessible address space. The unused area 463 of FIG.25 includes 12 bits, while the blade server system may include severaltens of server blades, so it is possible to identify all the servers inthe casing by using at least 6 bits, for example, of the 12 bits in theunused area.

On the other hand, in the case of the I/O card conforming toPCI-EXPRESS, it is impossible to make identification of the plurality ofservers #1 to #n on the I/O card side. Once the DMA transfer is started,the I/O card only recognizes the MM I/O base address 462 oninitialization, which makes it impossible to perform the DMA transfer tothe plurality of servers #1 to #n.

Therefore, according to this invention, in the case of the DMA transfer,the unused significant bits of the MM I/O base address 462 of the PCItransaction are used for storing the server identifier (addressinformation), and the interruption processing unit 620 of the I/Oprocessor blade 650 buries the requester 453 serving as the serveridentifier in the unused significant bits of the MM I/O base address462, to thereby perform address conversion.

Then, the interruption processing unit 620 writes the MM I/O address 462thus converted in the base address register 511 of the I/O card, andwrites the start of the DMA transfer in the command register 512, tothereby starts DMA of the I/O card.

After the DMA transfer is started by the I/O card, the transactiondecoder 404 of the I/O card sharing module 450 extracts, upon receivingthe PCI transaction from the I/O card, the server identifier buried inthe unused area 463 which is the significant bits of the MM I/O baseaddress 462, and writes the server identifier thus extracted in thedestination 452 of the header information 451 in the PCI transaction.Then, the transaction decoder 404 writes “0” in the area from which theserver identifier was extracted in the MM I/O base address 462, tothereby delete the route information of the requester buried in thearea. After that, the transaction decoder 404 transfers the PCItransaction to the switch 250. The switch 250 further transfers the PCItransaction to a server designated as the destination based on theheader information of the PCI transaction, that is, one of the servers#1 to #n which has requested the DMA transfer.

In other words, the interruption processing unit 620 of the I/Oprocessor blade 650 writes in the address register 511 of the I/O cardthe address of the requester 453 buried as a server identifier in theunused area 463 of the MM I/O, thereby activating the DMA transfer ofthe I/O card. Accordingly, with respect to the DMA transfer outputtedfrom the I/O card, the server identifier is extracted from the unusedarea 463 of the MM I/O base address 462 in the PCI transaction by theI/O card sharing module 450 to set the server identifier in thedestination 452 of the header information 451. Thus, the I/O card can beshared by the I/O card sharing module 450 and the I/O processor blade650 even when the I/O card itself does not have a function ofidentifying the plurality of servers #1 to #n.

(I/O Card Sharing Processing)

Next, FIG. 31 shows processing of I/O card sharing, mainly through a PCItransaction, by the I/O card sharing module 450 and the I/O processorblade 650.

As shown in FIG. 31, in S1, the servers #1 to #n each set a DMAinitialization command or the like in the PCI transaction to send thePCI transaction to the I/O card making I/O access. Each of the servers#1 to #n sets address information (server identifier) of its own in therequester 453 of the header information 451 of the PCI transaction, andsets the MM I/O base address allocated for the I/O card by each of theservers as the MM I/O base address 462.

In S2, when the I/O card sharing module 450 provided between the I/Ocard and the servers receives the PCI transaction, the PCI transactionis written in the memory space 603 x of the I/O processor blade 650because the PCI transaction includes the DMA initialization commandcontaining the write command to the address register 511 of the I/Ocard. At this time, the I/O card is not accessed.

In S3, when the servers #1 to #n each send the PCI transactioninstructing the start of the DMA transfer to the I/O card, the I/O cardsharing module 450 interrupts the I/O processor blade 650 to activatethe interruption processing unit 620 because the PCI transactionincludes a command to activate the operation of the I/O card.

The interruption processing unit 620 reads the MM I/O base address 462from the PCI transaction written in the memory space 603 x, and writesthe MM I/O base address 462 in the address register 511 of the I/O card.At this time, the interruption processing unit 620 buries the requester453 of the header information 451 indicating the requester server in theunused area 463 of the MM I/O base address 462 within the PCItransaction. Then, the interruption processing unit 620 writes start ofthe DMA transfer in the command register 512 of the I/O card to activatethe operation of the I/O card.

In S4, the I/O card performs the DMA transfer (write or read) withrespect to the MM I/O base address set in the address register 511.

In the PCI transaction by DMA from the I/O card, the server identifieris buried in the unused area 463 set as the significant bit of the MMI/O base address 462.

In S5, upon reception of the PCI transaction, the I/O card sharingmodule 450 provided between the I/O card and the servers #1 to #njudges, by the transaction decoder 404 shown in FIG. 24, whether or notthe PCI transaction is performed by DMA.

The judgment on whether or not the PCI transaction from the I/O card isperformed by DMA by the transaction decoder 404 is carried out asfollows. When all the bits of the unused area 463 of the MM I/O baseaddress 462 are not “0”, it is judged that the server identifier isburied to thereby judge that the PCI transaction is performed by DMA.

In the case of the PCI transaction by DMA, the transaction decoder 404sets the contents of the unused area 463 of the MM I/O base address 462in the destination 452 of the header information 451, and converts thecontents thereof into identifiable address information of the servers #1to #n by the switch 250. After that, the transaction decoder 404 setsall the bits of the unused area 463 to “0” and sends the PCI transactionto delete the contents of the unused area 463.

Based on the destination 452, the switch 250 transfers the PCItransaction by DMA to the requester server of DMA set in the destination452, and makes predetermined access with respect to the MM I/O set asthe servers #1 to #n.

As described above, the server identifier requesting DMA (requester 453)is set in the predetermined significant bit of the MM I/O base addressset in the address register 511 of the I/O card. Therefore, even ageneral-purpose I/O card can be shared by the plurality of server blades10-1 to 10-n because the I/O card sharing module 450 replacesinformation stored in the destination 452 of each of the PCItransactions with address information of the server requesting DMA evenwhen DMA is repeatedly made.

A time chart of FIG. 32 shows the above-mentioned processing in timeseries. First, in S11, each of the servers #1 to #n sets the DMAinitialization command or the like in the PCI transaction and sends thePCI transaction to the I/O card making the I/O access.

In S12, the I/O card sharing module 450 writes the contents of the PCItransaction in the memory space of the I/O processor blade 650 since thePCI transaction includes a write request with respect to the registersother than the command register 512 of the I/O card.

Next, in S13, each of the servers #1 to #n sets a write command to thecommand register 512 such as the DMA transfer start command in the PCItransaction, and sends the PCI transaction to the I/O card making theI/O access.

In S14, because the PCI transaction includes the write command to thecommand register 512 of the I/O card, the I/O card sharing module 450requests interruption to the CPU 602 of the I/O processor blade 650.

In S15, the CPU 602 of the I/O processor blade 650 activates theinterruption processing unit 620 and reads out the contents other thanthose of the command register 512, that is, the contents of the addressregister 511 or the like.

In S16, when the contents of the PCI transaction read by theinterruption processing unit 620 are processed by DMA, the requester 453of the header information 451 is set in the unused area 463 of the MMI/O base address 462. Then, the MM I/O base address 462 subjected toaddress conversion is written in the address register 511 of the I/Ocard, thereby activating the operation of the I/O card by writing theDMA transfer start command in the command register 512.

In S17, the DMA controller 513 of the I/O card makes the DMA access tothe memory space of the address register 511.

In S18, upon reception of the PCI transaction from the I/O card, thetransaction decoder 404 judges whether the PCI transaction is performedby DMA in the manner described above. When judging that the PCItransaction is performed by DMA, the transaction decoder 404 carries outthe conversion of the address information (rebuilding processing) bysetting the server identifier of the unused area 463 of the MM I/O baseaddress 462 of the PCI transaction main body 461 in the destination 452of the header information 451. Then, the transaction decoder 404 sendsthe PCI transaction to the server requesting DMA via the switch 250.

The above-mentioned procedure allows the DMA controller 513 of the I/Ocard to execute the processing by burying the identifier of the serverrequesting DMA in the unused area 463 of the MM I/O base address 462.Thus, a single I/O card can be shared by the plurality of servers #1 to#n.

(Setting Processing of Address Information)

Next, FIG. 33 shows a time chart illustrating an example of a settingprocessing of an address information table 411 executed when serverblades 10-1 to 10-n are activated.

The address information table 411 stored in the content addressablememory (CAM) 410 is updated by the processing shown in FIG. 33 everytime the server blades 10-1 to 10-n are activated. It should be notedthat a case where a server blade 10-1 is activated will be describedbelow.

First, the server blade 10-1 is tuned on in S20. In S21, by turning thepower of the server blade 10-1 on, the CPU 101 activates a BIOS (notshown) and requests read with respect to the configuration register ofeach of the I/O cards in order to perform initialization of the I/O card(device).

In S22, upon reception of the read request of the configuration registerof the I/O card, the transaction decoder 402 of the I/O card sharingmodule 450 interrupts the CPU 602 of the I/O processor blade 650, andwrites the PCI transaction including the read request of theconfiguration register in the memory 603 as described above. At thistime, because the MM I/O is not set yet in the server blade 10-1 fromwhich the read request of the configuration register has been sent, thetransaction decoder 402 writes the read request in the address set inadvance.

In S23, the CPU 602 of the I/O processor blade 650 activates theinitialization processing unit 630 shown in FIG. 30 by the interruptionof the transaction decoder 402. The initialization processing unit 630confirms the I/O card allocated to the server blade by reading the I/Ocard sharing settings table 610 from the read request of theconfiguration register written to the predetermined address.

In S24, when the I/O card is not allocated to the server blade 10-1(access prohibition), the initialization processing unit 630 notifiesthe server blade 10-1 that the allocation is not made via the I/O cardsharing module 450 (handled as master abort). On the other hand, whenthe I/O card is allocated to the server blade 10-1, the initializationprocessing unit 630 reads the contents of the configuration register ofthe I/O card and responds to the server blade 10-1. It should be notedthat the processing of S24 is executed sequentially for every I/O cardset in the I/O card sharing settings table 610.

In S25, upon reception of the contents of the configuration register ofthe I/O card from the I/O processor blade 650, the server blade 10-1sets the MM I/O space or the I/O space based on the obtained informationof the I/O card, and performs setting or the like of the MM I/O baseaddress. Then, the server blade 10-1 notifies the MM I/O base address tothe I/O card. The notification is executed for each I/O card.

In S26, the I/O card sharing module 450 receives the PCI transactionnotifying the MM I/O base address from the server blade 10-1. Becausethe PCI transaction includes the setting notification of the MM I/O baseaddress, the I/O card sharing module 450 interrupts the CPU 602 of theI/O processor blade 650, and then writes the PCI transaction notifyingthe MM I/O base address in the memory 603. It should be noted that theprocessing is performed in the same manner as in S22.

In S27, the CPU 602 of the I/O processor blade 650 activates theinitialization processing unit 630 by the interruption of thetransaction decoder 402. The initialization processing unit 630allocates the memory space 6031 associated with the I/O card of theserver blade 10-1 to the memory 603 in response to the settingnotification of the MM I/O base address of the server blade 10-1 writtento the predetermined address. Then, the initialization processing unit630 notifies the MM I/O base address and the offset of the server blade10-1, the address of the memory space 6031 of the I/O processor blade650, the address information of the allocated I/O card, and the addressinformation of the server blade 10-1 to the I/O card sharing module 450,to thereby reflect those addresses in the address information table 411of the CAM 410. It should be noted that the processing of S27 isrepeatedly executed for each I/O card used by the server blade 10-1.

By the processing described above, direct access is not made by theactivated server blades 10-1 to 10-n to the I/O cards 501 and 502 sharedby the plurality of server blades 10-1 to 10-n. Further, the I/Oprocessor blade 650 obtains the contents of the configuration registeras a substitute, and performs setting or the like of the memory space603 x. In the blade server system, a new server blade can be providedand activated when the other server blades are being operated. In suchthe case, the I/O processor blade 650 responds in place of the I/O cardat the time of activation of the new server blade, thereby making itpossible to activate the new server blade without influencing the I/Oaccess of the other server blades being operated.

It should be noted that the case where the server blade is activated bythe BIOS mounted thereto is described above. However, the processing canbe performed in the same manner as described above even in the case ofactivating the server blade by an extensible firmware interface (EMI)(not shown).

(I/O Card Sharing Settings Table)

Hereinafter, the I/O card sharing settings table 610 shown in FIG. 29will be explained.

As described above, the I/O card sharing settings table 610 shows whichone of the I/O cards is allocated to which server, and the administratorsuitably performing settings thereof from the console 5 or the like. Theattributes of the I/O cards, that is, “shared”, “exclusive”, and “accessprohibited” with respect to the servers #1 to #n are displayed on adisplay of the console 5 at an interface shown in FIG. 29, and may beset by the interface such as a mouse and a keyboard (not shown).

The identifier 611 of the I/O card sharing settings table 610 issuitably set by the administrator each time the I/O card is added orchanged. In addition, the type 612 indicating functions or the like ofthe I/O card can be set in the table by reading a class code and asubclass code of the configuration register of the I/O card.

With regard to the allocation states 613 to 615 of respective servers #1to #3, the administrator suitably sets the presence/absence of thesharing and the presence/absence of the allocation based on thecharacteristics, performance, and the like of the servers #1 to #3 andthe I/O cards. In FIG. 29, “shared” is set when the I/O card isallocated to the plurality of servers to be shared thereby, and “notallocated” is set when no allocation is made. When the card isexclusively used by a single server, “exclusive” is set and access tothis I/O card from other servers is prohibited. The server whose I/Ocard allocation is set to “not allocated” is denied of access to the I/Ocard when accessing the I/O card. When the type of the denied access isa read access, the I/O card sharing module 450 makes a response on datawhose every bit is 1, and when the access is a write access, the I/Ocard sharing module 450 notifies of the master abort.

The I/O card sharing settings table 610 is reflected on the I/O cardsharing settings 405 of the I/O card sharing module 450. Then, thesecond transaction decoder 403 managing the PCI transaction from the I/Oprocessor blade 650 to the I/O card permits only the valid transactionand prohibits illegal transaction, that is, access by the servers towhich the I/O cards are not allocated in the I/O card sharing settingstable 610, based on the I/O card sharing settings 405.

By the I/O card sharing settings table 610, all the I/O cards can beshared. However, a single I/O card is exclusively used by a certainserver and the other I/O card may be shared by servers to securethroughput or the like. Accordingly, “shared” and “exclusive” of the I/Ocard can be present at the same time and it becomes possible to flexiblystructure the I/O device of the blade server system, whereby theresources of the I/O device can be efficiently used.

Fourth Embodiment

FIG. 34 is a block diagram of a blade server system according to afourth embodiment of this invention. In the blade server system of thefourth embodiment, a command chain control unit 514 is additionallyprovided to the I/O card of the third embodiment. In addition, the bladeserver system is composed of I/O cards 1501 and 1502 for performing I/Oprocessing by sequentially reading a data structure set in a memory 102of each of servers #1 to #n. Other structures thereof are the same asthose of the third embodiment.

Each of the I/O cards 1501 and 1502 sequentially reads the datastructure set in the memory 102 of each of the server blades 10-1 to10-n and performs an I/O operation in accordance with the description ofeach of the data structures. In other words, each of the I/O cards 1501and 1502 performs a so-called command chain processing.

As shown in FIG. 35, the servers #1 to #n operated by the server blades10-1 to 10-n each set data structures 1020 to 1022 in the predeterminedaddress in the memory space of each of the servers #1 to #n in using theI/O cards.

For example, three data structures 1020 (CCWs 1 to 3) are set in theaddress 0xD in the memory space of the server #1, two data structures1021 (CCWs 11 and 12) are set in the address 0xE in the memory space ofthe server #2, and four data structures 1022 (CCWs 21 to 24) are set inthe address 0xF in the memory space of the server #3. It should be notedthat each of those data structures is suitably set by the OS or theapplication of each of the servers #1 to #3.

A flag indicating the presence of subsequent data structure is set inthe head and intermediate data structures. For example, the flagindicating the presence of subsequent data structure is set in the CCWs1 and 2 of the data structure 1020, and the flag is not set in the CCW3, which indicates that the CCW 3 is the final data structure.

Each of the servers #1 to #3 sends a command to activate the operationto the I/O card 1501 or 1502 to be used, and notifies the I/O card 1501or 1502 of the addresses of the data structures 1020 to 1022 set in eachof the memory spaces.

Upon reception of the address set in the memory spaces along with thecommand of activation from each of the servers #1 to #n, each of the I/Ocards 1501 and 1502 reads the data structure in the specified memoryspace to execute I/O access written in the data structures 1020 to 1022.

An example is shown below in which an I/O card sharing module 450 and anI/O processor blade 650 identical to those of the third embodiment areapplied to the blade server system equipped with the I/O cards 1501 and1502 for performing the command chain processing described above.

FIG. 36 is a time chart showing a sharing processing of the I/O cards bythe I/O card sharing module 450 and the I/O processor blade 650. Itshould be noted that a case where the server #1 (server blade 10-1) usesthe I/O card 1501 will be explained below.

First, the server #1 sends a PCI transaction instructing to activate theoperation to the I/O card 1501 for making the I/O access (S31). Throughthe PCI transaction, an MM I/O base address allocated to the I/O card1501 by the server #1 is set as the MM I/O base address 462. Further,the command instructing to activate the operation and the address 0xD ofthe data structure 1020 are set in the PCI transaction main body 461.

Upon reception of the PCI transaction from the server #1, the I/O cardsharing module 450 analyzes the command of the PCI transaction. Becausethe command to activate the operation is a write command to the commandregister 512 of the I/O card 1501, the I/O card sharing module 450interrupts the CPU 602 of the I/O processor blade 650 (S32). At the sametime, the I/O card sharing module 450 writes contents of the PCItransaction in the predetermined memory space 6031 of the I/O processorblade 650 as shown in FIG. 27 (S33).

The CPU 602 activated by the interruption activates the interruptionprocessing unit 620 of the third embodiment. Then, the interruptionprocessing unit 620 reads the address 0xD of the data structure 1020 ofthe PCI transaction written in the memory space to obtain the requester453 of the MM I/O base address 462. Next, the interruption processingunit 620 reads the data structure 1020 from the address obtained withrespect to the memory 102 of the sever #1 of the requester 453, andcopies the data structure 1020 in the predetermined memory space of theI/O processor blade 650 (S34). It should be noted that, as shown in FIG.37, each of the memory spaces is an area provided for storing the datastructure set in advance for each of the servers #1 to #n. In thisexample, addresses 0xS, 0xT, and 0xU are set as the memory spaces forthe data structure of each of the servers #1 to #3, and the datastructure 1020 of the server #1 is stored in the address starting from0xS as shown in FIG. 37.

Next, the interruption processing unit 620 executes processing of the MMI/O used in DMA or the like. The requester 453 of the header information451 set in the unused area 463 of the MM I/O base address 462 throughthe PCI transaction written in the memory space 6031 is written in theaddress register 511 of the I/O card 1501 as the target, therebyexecuting an address conversion (S35).

After that, the interruption processing unit 620 writes the command toactivate the operation in the command register of the I/O card 1501 andnotifies the command register of the address 0xS of the data structure1020 to the command chain control unit 514, based on the command of thereceived PCI transaction to activate the operation of the I/O card(S36).

The I/O card 1501 is activated based on the activation command of theinterruption processing unit 620, and reads one data structure (CCW 1)1020 from the address 0xS of the memory 603 of the I/O processor blade650 received (S37). With respect to the communication from the I/O card1501 to the I/O processor blade 650, the I/O card sharing module 450performs the transfer as it is.

The I/O card 1501 performs the I/O operation in accordance with thedescription of the data structure 1020. For example, when the read datastructure 1020 is processed by DMA, the I/O card 1501 executes DMAtransfer with respect to the MM I/O base address set in the addressregister 511 (S38). In the PCI transaction by DMA from the I/O card, aserver identifier is buried in the unused area 463 set as thesignificant bit of the MM I/O base address 462.

Upon reception of the PCI transaction, the I/O card sharing module 450provided between the I/O card 1501 and the server #1 judges whether ornot the PCI transaction is performed by DMA by the transaction decoder404 shown in FIG. 24.

The judgment on whether the PCI transaction from the I/O card isperformed by DMA by the transaction decoder 404 is carried out asfollows. When all bits of the unused area 463 of the MM I/O base address462 are not “0”, it is judged that the server identifier is buried andthat the PCI transaction is performed by DMA.

In the case of the PCI transaction by DMA, the transaction decoder 404sets the contents of the unused area 463 of the MM I/O base address 462in the destination 452 of the header information 451, and converts thecontents thereof into identifiable address information of the servers #1to #n by the switch 250. After that, the transaction decoder 404 setsall the bits of the unused area 463 to “0” and sends the PCI transactionto delete the contents of the unused area 463 (S39).

Based on the destination 452, the switch 250 transfers the PCItransaction by DMA to the requester server of DMA set in the destination452, and makes predetermined access with respect to the MM I/O set asthe servers #1 to #n.

When the I/O operation specified by the data structure (CCW 1) 1020 iscompleted, the I/O card 1501 reads the subsequent data structure (CCW 2)from the specified address 0xS of the memory 603 of the I/O processorblade 650 to execute the operation in the same manner as describedabove.

Thus, in the case of the I/O cards 1501 and 1502 for performing thecommand chain processing, the data structures 1020 to 1022 of respectiveservers #1 to #3 are copied to the memory space of the memory 603 of theI/O processor blade 650, whereby the I/O processor blade 650 responds tothe read request from the I/O cards 1501 and 1502 in place of each ofthe servers #1 to #3.

Therefore, by notifying the I/O card of the addresses of the memory 603storing the data structures 1020 to 1022 of the servers requesting theI/O access at the time of activation of the I/O cards 1501 and 1502, itbecomes possible for the I/O cards for performing the command chainprocessing to be shared by a plurality of servers #1 to #3.

Fifth Embodiment

FIG. 38 is a block diagram of a blade server system according to a fifthembodiment of this invention. In this embodiment, an I/O card sharingmodule 450 is incorporated in the switch 250 of the first or fourthembodiment to be integrated therewith.

A switch 250A includes the I/O card sharing module 450 and operates inthe manner as described in the first or fourth embodiment. Byincorporating the I/O card sharing module 450 in the switch 250A, itbecomes possible to reduce the number of slots mounted to the bladeserver system, and to structure the casing in a compact manner.

(Summary)

As described above, according to this invention, in the case ofperforming DMA transfer while sharing a single I/O card by a pluralityof server blades 10-1 to 10-n, the I/O processor blade 650 buries in theMM I/O base address the identifier of the server requesting DMA to theaddress register 511 of the I/O card at the time of start of the DMAtransfer, and the I/O card sharing module 450 relaying the PCItransaction to the server from the I/O card replaces the informationstored in the destination 452 of the header information 451 with theserver identifier buried in the MM I/O base address, thereby making itpossible to share an I/O card having a general-purpose bus by theservers.

After the start of the DMA transfer, because the I/O processor blade 650does not intervene in the transfer of the PCI transaction and hardwaresuch as the I/O card sharing module 450 performs the address conversion,the sharing of the I/O card by the plurality of servers can be realizedwhile preventing the overhead due to software processing anddeterioration in performance of the I/O access as in the conventionalcase.

In addition, the I/O card shared by the plurality of server blades 10-1to 10-n may be composed of a general-purpose interface, so in performingserver integration as described above, the I/O card conventionally usedcan be used as it is, thereby making it possible to suppress theincrease in cost in performing the server integration.

Further, in performing the server integration, a single I/O card can beshared by the plurality of server blades 10-1 to 10-n, so it becomespossible to reduce the number of I/O cards and to use a compact casingwhile preventing an increase in the number of I/O cards as in theconventional case.

Further, by the I/O card sharing settings table 610, the I/O card sharedby the plurality of servers and the I/O card exclusively used by asingle server blade can be present at the same time, so it becomespossible to flexibly structure the blade server system.

It should be noted that in the third to fifth embodiments describedabove, an example in which a PCI-EXPRESS is used as the general-purposebus is shown. However, general-purpose bus such as a PCI or PCI-X mayalso be employed.

In addition, in the third to fifth embodiments described above, anexample in which one of the server blades 10-1 to 10-n corresponds toone of the servers #1 to #n is shown. However, the server identifier maybe set as the logical partition number when each of the servers #1 to #nis configured by the logical partition of the virtual computer.

Further, in the third to fifth embodiments described above, an exampleis shown in which the I/O processor blade 650 is directly coupled to theI/O card sharing module 450. However, the I/O processor blade 650 may becouple to the switch 250. Further, the server blade may execute theprocessing of the I/O processor blade 650 in place of the I/O processorblade 650. In the third to fifth embodiments described above, the I/Ocard sharing module 450 and the I/O processor blade 650 are independentof each other. However, an I/O hub mounted with the I/O card sharingmodule 450 and the I/O processor blade 650 may be employed as in thefirst embodiment.

As described above, this invention can be applied to a computer systemcomposed of a plurality of servers and to a chip set thereof.

While the present invention has been described in detail and pictoriallyin the accompanying drawings, the present invention is not limited tosuch detail but covers various obvious modifications and equivalentarrangements, which fall within the purview of the appended claims.

1. A computer system, comprising: at least one node composed of at leastone processor and memory; an I/O hub connecting at least one I/O card; aswitch connecting the node and the I/O hub, and a server run by one or aplurality of the at least one node, wherein the server is set in advanceto allow one of exclusive use and shared use of the I/O card connectedto the I/O hub via the switch, wherein the I/O hub allocates a virtualMM I/O address unique to each server to a physical MM I/O addressassociated with each I/O card, wherein the I/O hub keeps allocationinformation indicating relation between the allocated virtual MM I/Oaddress, the physical MM I/O address, and a server identifier unique tothe server, and wherein, when a request to access the I/O card isreceived from the server, the I/O hub refers to the allocationinformation to extract the server identifier from the access request,and based on the extracted server identifier, identifies the server thathas issued the access request.
 2. The computer system according to claim1, wherein, when a request to access the I/O card is received from theserver, the I/O hub sends the server identifier, along with the receivedaccess request, to the I/O card, and wherein, when a DMA request isreceived from the I/O card in response to the access request, the I/Ohub extracts the server identifier from the DMA request, and based onthe extracted server identifier, identifies the server to which the DMArequest is made.
 3. The computer system according to claim 2, whereinthe I/O hub converts a DMA address contained in the DMA request into anaddress that is associated with the server identified as a server towhich the DMA request is made, and wherein the I/O hub transfers the DMArequest sent from the I/O card to a memory space in the identifiedserver at the converted address.
 4. The computer system according toclaim 2, wherein, when a request to access the I/O card is received fromthe server, the I/O hub buries the server identifier in a significantbit of a DMA address contained in the access request, and wherein, whena DMA request is received from the I/O card in response to the accessrequest, the I/O hub extracts the server identifier buried in asignificant bit of a DMA address contained in the DMA request, and basedon the extracted server identifier, identifies the server to which theDMA request is made.
 5. A computer system, comprising: at least one nodecomposed of at least one processor, memory, and node controller, eachnode controller connecting the nodes to one another and connecting atleast one I/O card; and a server run by one or a plurality of the atleast one node, wherein the node controller allocates a virtual MM I/Oaddress unique to each server to a physical MM I/O address associatedwith each I/O card, wherein the node controller keeps allocationinformation indicating relation between the allocated virtual MM I/Oaddress, the physical MM I/O address, and a server identifier unique tothe server, and wherein, when a request to access the I/O card isreceived from the server, the node controller refers to the allocationinformation to extract the server identifier from the access request,and based on the extracted server identifier, identifies the server thathas issued the access request.
 6. The computer system according to claim5, wherein, when a request to access the I/O card is received from theserver, the node controller sends the server identifier, along with thereceived access request, to the I/O card, and wherein, when a DMArequest is received from the I/O card in response to the access request,the node controller extracts the server identifier from the DMA request,and based on the extracted server identifier, identifies the server towhich the DMA request is made.
 7. The computer system according to claim6, wherein the node controller converts a DMA address contained in theDMA request into an address that is associated with the serveridentified as a server to which the DMA request is made, and wherein thenode controller transfers the DMA request sent from the I/O card to amemory space in the identified server at the converted address.
 8. Thecomputer system according to claim 6, wherein, when a request to accessthe I/O card is received from the server, the node controller buries theserver identifier in a significant bit of a DMA address contained in theaccess request, and wherein, when a DMA request is received from the I/Ocard in response to the access request, the node controller extracts theserver identifier buried in a significant bit of a DMA address containedin the DMA request, and based on the extracted server identifier,identifies the server to which the DMA request is made.
 9. A computersystem, comprising: a plurality of servers; a switch connecting theplurality of servers to an I/O card from which data can be written inmemory spaces of the servers; an I/O card sharing unit interposedbetween the switch and the I/O card to operate an access request signaland a response signal which are exchanged between the servers and theI/O card; and an I/O processor which can access the I/O card and the I/Ocard sharing unit and which has a memory and a processor to manageallocation of the I/O card to the plurality of servers, the memoryaccepting a data write from the I/O card sharing unit, the processoraccepting an interruption command from the I/O card sharing unit,wherein the switch has a header processing unit which sends an accessrequest signal to the I/O card sharing unit after attaching adestination of the access request signal and routing information of arequester to header information of the access request signal, the accessrequest signal containing an instruction made by one of the servers tothe I/O card and a base address specifying one of the memory spaces, andwhich transfers a response signal to a server specified as a destinationof the response signal by header information of the response signal, theresponse signal containing a response from the I/O card to the serverand a base address specifying the memory space, wherein the I/O cardsharing unit includes: a request signal writing unit which writes theaccess request signal in a memory of the I/O card sharing unit when theinstruction in the access request signal received from the switch is apreset first instruction; an interruption causing unit which interruptsthe I/O processor when the instruction in the access request signalreceived from the switch is a preset second instruction; a headermodifying unit which, when the response signal from the I/O card to theserver is received, uses the base address contained in the responsesignal to extract routing information of the requester, and sets theextracted routing information of the requester as a destination in theheader information of the response signal; a base address modifying unitwhich deletes the routing information of the requester buried in thebase address in the response signal; and a transmission unit which sendsthe response signal to the switch, wherein the I/O processor includes:an address conversion unit which, when the interruption happens, obtainsthe routing information of the requester from the header information ofthe access request signal and sets the routing information in a givensignificant bit of the base address contained in the access requestsignal that is written in the memory; a response address setting unitwhich sets, as the destination of the response from the I/O card, thebase address set in the significant bit of the header information of therequester through the interruption operation; and an I/O card activatingunit which sends the access request signal to the I/O card and activatesan operation of the I/O card through the interruption operation, andwherein the I/O card sends a response signal of processing relevant tothe access request signal to the I/O card sharing unit in response. 10.The computer system according to claim 9, wherein the I/O card includes:a command register which sets an instruction contained in the accessrequest signal; and an address register which designates a server memoryspace to which the response to the access request signal is sent,wherein, when the instruction in the access request signal is the firstinstruction, which includes processing of writing in other registers inthe I/O card than the command register, the request signal writing unitwrites the access request signal received from the switch into thememory of the I/O card sharing unit, wherein, when the instruction inthe access request signal received from the switch is the secondinstruction, which includes processing of writing in the commandregister, the interruption causing unit interrupts the I/O processor,and wherein the response address setting unit writes, in the addressregister of the I/O card, the base address set in the significant bit ofthe header information of the requester through the interruptionoperation.
 11. The computer system according to claim 10, wherein, whenthe instruction in the access request signal includes DMA initializationprocessing, in which a data write in the address register is requested,the request signal writing unit writes the access request signalreceived from the switch into the memory of the I/O card sharing unit,and wherein, when the access request signal received from the switchcontains a start DMA instruction, which requests a data write in thecommand register, the interruption causing unit interrupts the I/Oprocessor.
 12. The computer system according to claim 9, wherein the I/Oprocessor has an I/O card sharing settings management unit to set inadvance attributes for identifying which server is allowed to use theI/O card and for discriminating whether the I/O card is shared or not,and wherein, when the I/O card sharing settings management unit isreferred to and the attributes of the server of the requester indicatethat the I/O card that is the destination of the access request signalreceived from the switch is available to and exclusively used by therequester server, the I/O card sharing unit transfers the request signalfrom the server as it is to the I/O card, and transfers the responsesignal from the I/O card to the server without adding any changes to theresponse signal.
 13. The computer system according to claim 9, whereinthe I/O processor has an I/O card sharing settings management unit toset in advance attributes for identifying which server is allowed to usethe I/O card and for discriminating whether the I/O card is shared ornot, and wherein, when the I/O card sharing settings management unit isreferred to and the attributes of the server of the requester indicatethat the I/O card set as the destination of the access request signalreceived from the switch is not available to the requester server, theI/O card sharing unit sends one of “master abort” and data in whichevery bit is 1 in response to the request signal from the server. 14.The computer system according to claim 9, wherein the I/O processorincludes: an I/O card sharing settings management unit to set in advanceattributes for identifying which server is allowed to use the I/O cardand for discriminating whether the I/O card is shared or not; and anattribute setting unit for setting attributes in the I/O card sharingsettings management unit via an input unit connected to the I/Oprocessor.
 15. The computer system according to claim 9, wherein the I/Ocard sharing unit includes an address information judging unit whichcompares header information in the access request signal received fromthe switch against preset information, and then outputs addressinformation used in one of the request signal writing unit and theinterruption causing unit, and wherein the address information judgingunit sets the information in accordance with an I/O card writeinstruction executed upon activation of the server.
 16. The computersystem according to claim 10, wherein the I/O card includes: a commandregister which sets an instruction contained in the access requestsignal; an address register which designates a server memory space towhich the response to the access request signal is sent; and a commandchain processing unit which reads a data structure stored in the servermemory, wherein the first instruction and the second instruction arecomposed of one activation request and executed by writing theactivation request in the command register of the I/O card, wherein therequest signal writing unit writes the request signal in the memory ofthe I/O processor, and also writes in the memory of the I/O processorthe data structure set in the server memory, and wherein, afteractivated, the I/O card reads and processes the data structure set inthe memory of the I/O processor.
 17. A computer system, comprising: aplurality of servers; a switch connecting the plurality of servers to anI/O card from which data can be written in memory spaces of the servers;an I/O card sharing unit interposed between the switch and the I/O cardto operate an access request signal and a response signal which areexchanged between the servers and the I/O card; and an I/O processorwhich can access the I/O card and the I/O card sharing unit and whichhas a memory and a processor to manage allocation of the I/O card to theservers, the memory accepting a data write from the I/O card sharingunit, the processor accepting an interruption command from the I/O cardsharing unit, wherein the switch has a header processing unit and theI/O card sharing unit, the switch sending an access request signal tothe I/O card sharing unit after attaching a destination of the accessrequest signal and routing information of the requester to headerinformation of the access request signal, the access request signalcontaining an instruction made by one of the servers to the I/O card anda base address specifying one of the memory spaces, the headerprocessing unit transferring a response signal to a server specified asthe destination of the response signal by header information of theresponse signal, the response signal containing a response from the I/Ocard to the server and a base address specifying the memory space,wherein the I/O card sharing unit includes: a request signal writingunit which writes the access request signal in a memory of the I/O cardsharing unit when the instruction in the access request signal receivedfrom the header processing unit is a preset first instruction; aninterruption causing unit which interrupts the I/O processor when theinstruction in the access request signal received from the headerprocessing unit is a preset second instruction; a header modifying unitwhich, when the response signal from the I/O card to the server isreceived, uses the base address contained in the response signal toextract routing information of the requester, and sets the extractedrouting information of the requester as a destination in the headerinformation of the response signal; a base address modifying unit whichdeletes the routing information of the requester buried in the baseaddress in the response signal; and a transmission unit which sends theresponse signal to the switch, wherein the I/O processor includes: anaddress conversion unit which, when the interruption happens, obtainsthe routing information of the requester from the header information ofthe access request signal and sets the routing information in a givensignificant bit of the base address contained in the access requestsignal that is written in the memory; a response address setting unitwhich sets, as the destination of the response from the I/O card, thebase address set in the significant bit of the header information of therequester through the interruption operation; and an I/O card activatingunit which sends the access request signal to the I/O card and activatesan operation of the I/O card through the interruption operation, andwherein the I/O card sends a response signal of processing relevant tothe access request signal to the I/O card sharing unit in response. 18.The computer system according to claim 11, wherein one of PCI andPCI-EXPRESS is employed to connect the servers with the switch, toconnect the switch with the I/O card sharing unit, and to connect theI/O card sharing unit with the I/O card.
 19. A method of allocating MMI/O addresses of an I/O card that is shared among servers run on onenode, comprising the steps of: obtaining a physical MM I/O address ofthe I/O card and a physical MM I/O address area that is located at thephysical MM I/O address; setting, as a virtual MM I/O address area, avalue that is calculated by multiplying the obtained MM I/O address areaby a maximum count of servers sharing the I/O card, determining, as avirtual MM I/O address that is used by one of the servers sharing theI/O card, a start address of an area subsequent to a used MM I/O areawithin an unused area of the virtual MM I/O address area, and allocatingthe determined virtual MM I/O address to the server, and creatingallocation information which indicates relation between the allocatedvirtual MM I/O address, the physical MM I/O address, and one of theserver and a server identifier unique to the server.
 20. The method ofallocating MM I/O addresses according to claim 19, wherein the I/O cardhas an interface compliant with a PCI standard, and wherein the methodof allocating MM I/O addresses further comprises the steps of:requesting to secure bus numbers of a PCI configuration space used bythe I/O card, device numbers, and function numbers as many as themaximum count of the logical servers sharing the I/O card; and setting,as a virtual MM I/O address area, a value multiplied by the maximumcount of logical servers sharing the I/O card based on the request tothe PCI configuration space.